


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1970 


Computer simulated learning: a digital 
Simulation of embedding field outstar networks. 


Frasier, Charles C. 


Massachusetts Institute of Technology 


http://ndl.handle.net/10945/14928 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


f (8 D U DLEY research materials and institutional publications created by the NPS community. 
«ist : Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NY KNOX appointed — and published -- scholarly author. 

| | LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


COMPUTER SIMULATED LEARNING: A DIGITAL 
SIMULATION OF EMBEDDING FIELD OUTSTAR 
NETWORKS 


Charles C. Frasier 








COMPUTER SIMULATED LEARNING: A DIGITAL SIMULATION OF EMBEDDING 
FIELD OUPSTAR NETWORKS 
By 
CHARLES C, FRASIER, JR, 
B.S., MASSACHUSEITS INSTITUTE OF TECHNOLOGY 
(1965) 
SUPMITTED IN PARTIAL FULFILLMEND 
OF THE REQUIREMENTS FOR THE 
DEGREES OF NAVAL ENGINEER’ 
AND MASTER OF SCIENCE IN 
FLECERICAL ENGINEERING 
at the 
MASSACHUSETTS INSTITUTE OF 
TECHNOLOGY 


June 1970 


ee 


Niclas Aare, 


| Sia) tht SCHOOL 
MO} we chad 5 an oe hed ide. 93940 ; 


COMPUTER SIMULATED LEARNING: A DIGITAL SIMULATION OF EMBEDDING 


FIELD OUTSTAR NETWORKS 


By 
CHARLES C, FRASIER 


Submitted to the Department of Naval Architecture and 
Marine Engineering and the Department of Hlectrical 
Engineering on June 4+, 1970 in partial fulfillment of the 
requirements for the degrees of Neval Engincer and Master of 
Science in Electrical Engineering. 


ABSTRACT 


This thesis reports the results of digitelly sinmlating outstear 
enbedding field learning networks. An outstar is a device that is 
capable of inductively learning to associate the occurance of a 
command event with a pattern of ovents. Once this association is 
learned, the outstar will reproduce tha pattern whenever the conmand 
event occurs. 

A simple ovitstar was studied. It was found that a fast rate for 
forgetting accwinlated experionce is necessary to maintain control 
of the amplitudes of the outstar's responses. It was further found 
that a fast rate for forgetting accumulated experience results in 
poor noise resistance but good adaptability. A slow forgetting rate 
results in good noise resistance but poor adaptability. The practica 

aspects of thresholds was studied, eee 

. A laterally inhibiting outstar was studied. It was found that 
the active process of lateral inhibition resvits in beth good noise 
resistance and good adaptability. 

A short study of outstar avalanches was made. An outstar avalenche 
is a cascade of outstars which can learn and reproduce time varying 
patterns. Jt was found that a command node cascade avalanche does not 
work well because of pulse lengthening. A “long axon with collaterals" 
avalanche was studied. 

A virtual laterally inhibiting outstar was studied, 

A convenient method for analyzing now formulations for the eee ee 
process in an outstar was devoloped.- A “generalized” learning process 
was developsd and studied. 

The analogy between embedding field theory and the nervous systen 
of living orgenisms was introduced. The theoretical proposal that 
learning on the neurophysiological level is due to the production of 
transmitter in a synaptic cleft proportional to the correlation between 
a synaptic and postsynaptic membrane potentials Was tet to simplist~ 

cally model a learning process for outstars, 


Thesis pray sore Jan T. Young 
Title: Professor of Electrical Engineering 


a 








ly 


TABLE OF CONTENTS 
Abstract 


EMBEDDING FIELD NETWORKS 

1.1 Introduction 

1.2 Illustrative Derivation of an. Rnbedding 
Field Network 


1.3 Generalized Embedding Fields 


THE OUTSTAR AND THE OUTSTAR AVANANCHE EMBEDDING FIELD 
NETWORKS 

2.1 Description of the Networks 

2.2 Theoretical Work on Outstars and Outstar Avalanches 


2.3 Approach to the Study 


THE SIMPLE OUTSTAR 

3.1 Specification of the Parameters for the Study 

3.2 Experiment I - A Look at a Simple Outstar 

3-3 A simple Outstar with a "Fast" Forgetting Rate 

3.4 Resistance to Random Mistakes vs. Correction of 
Learned Mistakes: <A Philosophy for Learning in 
Outstars 

3-5 The Occurance of a Pattern of Events Over a poriod of 
Time; Thresholds 


3.6 Other Input Pulse Shapes 


LATERAL INHIBITION 


4.40 Introduction to Lateral Inhibition 


Ss 


te 
Ag 


eat 


30 


yee 


Ge. 
Si 


ae 


De 


aa 


7 


78 
Ce 





4.2 Experimental Study of an Outstar with Lateral 
Inhibition Cie 


4.3 Advantage of Correcting a Learned Mistake with 


Lateral Inhibition SHC. 

Wott Further Remarks on Lateral Inhibition 45 
THE OUTSTAR AVALANCHE 100 
100 


5.1 Introduction 
5.2 A Simple Avalanche 107 


5.3 A Laterally Inhibiting Avalanche _ bt] 


THE VIRTUAL LATERALLY INHIBITING OUTSTAR [t5 
6.1 Other Outstars which Control the Maximun Amplitudes 

of Grid Node Responses | 1s 
6.2 Specifying the Parameters iin a Virtual Laterally 

Inhibiting Outstar . ed 
6.3 Results of the Experiments with a Virtual Laterally 

Inhibiting Outstar | | 127 
6.4 A Virtual Laterally Inhibiting Outstar with Thresholds 

and an Intermediate Forgetting Rate Designed to Learn 

Patterns of More than One Event Poh 
6.5 An Experiment with a Virtual Laterally Inhibiting Outstar 


with Thresholds and an Intermediate Forgetting Rate 


Designed to Learn Patterns of More than One Event a 
OTHER FORMULATIONS FOR THE zg PROCESS [+4 
(ee ae rocuciaon | 1449 








72 A Description of the States of the Processes in 
an Outstar 

7°3 Logics 

74 Formulation of the z Process Conforming to Logic d 5 

7.5 Specification of the Parameters in an Outstar 
Conforming to Logic &, 

7.6 Experiments with a Directly Inhibiting Outstar 

7? Generality of the Formulation of the z Process 


Conforming to Logic A. 


8 THE CHEMICAL OUTSTAR 

Coles Unttodwcuslan | 

8.2 The Analogy Between Embedding Field Networks 
and the Nervous System of Living Organisus 

8.3 Sunmary of the Theoretical Proposal for the Neurophysio- 
logical Process of Learning in Living Organisms 

8.4 <A simplistic Model for the Neue seni olor teal Phenomena. 
in a Nervous Network Based on Rabedding Field Theory 

8.5 Experiments with the Simplistic Neuvrophysiological Model 

8.6 Inhibition and an & Logic 


8.7 The Chemical Outstar 


Appendix A The Digital Simulation and its Accuracy 


References 


15 
162 
167 


ies 


Nien 


18% 


189 


fo 


\A0 


146 


a 
obo 
Zae 


Cos 


244 


Coc 





CHAPTER 14 EMBEDDING FIELD NETWORKS 


Section le Introduction 


Grossberg has developed a theory for learning called embedding 
field theory. (Refs. 1 - 10) He has proposed several devices designed 
in accordance with this theory to handle broad categories of learning 
phenonena, These devices are inductive learming machines which are 
governed by a set of deterministic equations. He has qualitatively 
demonstrated their learming abilities. He has further draw an 
analogy between embedding field theory and the nervous system of 
living organisms. Based on this analogy, he has made a concrete 
proposal for the neurophysiological phenomena underlying learning 
in living organisms. | 

By means of a digital simulation, this thesis experimentally 
studies one embedding field device called an outstar, and it will _ __ 
examine a combination of outstars called an outstar eBMalanche, The 
analogy between the nervous system of living organisns and embedding 
field theory will be introduced and examined. 

For the uninitiated, we will begin by deriving the basic concepts 


of embedding field theory from intuitive ideas about learning, 





section 1.2 lllustrative Derivation of an Enbedding Field Network 


Eubedding Field theory is a mathematical model for learning, To 
gain an operational appreciation of this model, consider modeling the 
following learning oxperinment: 

An experimenter teaches a subject an arbitrary time sequential list 
of letters of the alphabet by saying the list to the subject several 
times, At the end of this instruction, the subjoct is requested to 
repest the list. If he can, then it~ is concluded that he has learmed 
the List. 

In order for the subject to learn the list, the letters composing 
the list must be familiar to him and must appear to be separate events, 
One of the tasks of this experimont will be to teach the subject to 
conbine the separate letters of the alphabet into a new event which is 
the list. We expect that after instruction, presentation of the first 
letter of the list will autonatically result in the subjsct ennee ete 
hear the succeeding letters of this list. 

We begin our description by nodeling the subject's state before the 
experiment has begun. He is familiar with the letters of the alphabet 
and recognizes them as separate eventse We modol this by assigning a 
distinct point inereéo to each letter of the alphabet and calling 
these points nodes, To denote recognition of a lotter of the alphabet, 
Asy we assign a time varying process x,(%) to each node V,. x,(t) has 
the properties: 

(a) x(t) ~ O when the letter A; has not bcen presented to ae 
sub ject recently. 


(b) x(t) > 0 when the letter A, has been presented to the subject 


7 





recently. 

As x, (t) indicates only the two conditions (a) and (b) above, we 
may constrain x, (t) to be non negative. 

We model the experimenter's ability to communicate with the subject 
Similarly. When the experimenter says the letter A, to the subject, a 
non negative input pulse P,(t) is delivered to the appropriate node Vy 
in the subject. The pulse P,(t) has the properties: 

(c) P,(%) > 0 when the exporinenter says A,. 

(d) P,(t) = 0 211 othor times. 

It will require a small, but finite, time interval for the experimen- 
ter to say As. P5(%) is non zero Guring this tine interval, 

We are now in a position to write a differontial equation for x,(t): 

egn (1) x4(%) = =e by 

Equation (1) was chosen to model the response of V, to presentation 
of the Lotter A, because it is the simplest continuous representation for 
x,(t) satisfying conditions (a) and (b) on x,(t). 

The experiment is now begun, The experimenter says a list Ay, Ang 
eee An to the subject. There will be a time intorval, Ws, botween the 
presentation of each letter, For simplicity, we assume that these time 
intervals are all the sane. 

At the beginning of the experiment the subject has no idea of what — 
the oxperimenter's list is. Therefore, when A, is presented the subject 
can only guess, with probability 1/26 of success, what the experimenter's 
Selection for the second letter A, is, This carrics throughout the list. 
If the experimenter has presented lettor Ass the subject can only guess 


with probsbility 1/26 of success, what the A..4 letter is. 


8 








However, When the experimenter presents the list for the second tine, 
we expect the suoject to be able to predict the succeeding letters of this 
List with much greater accuracy. When the subject has learned the list, 
he will be able to predict all the letters in the list, in their correct 
order, with certainty. 

We must now model this process. 

Firstly, we have said that the subject has the ability to predict 
what the succecding letters of the list are, and this ability becomes 
more successful after each presentation of the List. 

Let us model this prediction process by connecting each of our nodes 
to every other node with transmission lines which we shall call edges, 

We allow the signal x, (t) fron a node to travel away fron that node along 
the edges to cach of the other nodes where it can act as input to these 
nodes. Tho actual prediction of the letter following the i letter is 
modeled in the sane manner as awareness of a letter being presented by 
the experimenter, The appropriate x5 (%) process is excited by the pre- 
diction signals arriving via tho edge fron oe 

The subject's ability to only blindly guess what each succocding 
letter is when the list is first presented means that equal prediction 
signals are recived at all nedes at the beginning of the experiment, 

His ability to predict the entire list in the correct order after learning 
means that after excitat:ion of the V; node, e prediction signal is recdived 
only at the correct Vsag node, | 

Prediction of the letters of the list in their correct order requires 


that the Vs node be excited by prediction signals before the V nods, 


zt 


To accomplish this, we constrain the prediction signals traveling along 


the edges to a finite transmission velocity, That is, the signal x,(t) 


q J 








Figure 1.2.1. Geometric 


schematic of nodes and directed edges. 





originating at node Vs arrives at nodo Vet after a time dolay of C 30544 


time units, | | 
The situation we havo described so far is pictured in figure i.2.1, 
In figure 1.2.4 we have draw the edge °45 4s two directed edges, 


6... and O44 to stress thatathe Picic i end AsAs are distinct, The 


ij 
arrowhead indicates the direction of transmission along the directed edee, 
Refering to figure 1.2.1 one can easily see how the subject predicts 
the succeeding gt eeane the list after he has learned it. If he Vie 
learmed the list eee AsdAscoes excitement of x,(t) will result ina signal 
traveling to V). It will arrive at V,,7, time units later and x(t) will 
be excited and a signal will be sent to V and so on. For simplicity, 
we shall assume that all the transmission dolays are equal, or T 45 = 
moreasl i and js 
The effect of learning on the subject's prediction process is as 
follows: ‘ aa 
~ (e) Before learning, excitement of the V. node by presentation of 
eas letter results in equal prediction signals wteiving at ali nodes 
to which te is connected by edges t timo units after presentation of 
the A, letter. 
| (f) After Learning, excitement of the We node by presentation of 
the letter As results in a Large prediction signal being delivered to the 
Von node fron Vy t time units after presentation os ani No prediction 
Signals, or at least small prediction signals, are delivered to the other 
nodes connected to Es by edges. 
Now, we must develop a mechanism which connects the subject's 


prediction process fron state (e) to state (f) as the list is repeatedly 


presented, 
i 





To develop this mecnanisu, we note that the experimenter is present- 
ing lettors to the subject every w time units. If wis too small, say | 
1 millisecond, the subject will be unable to distinguish the separate 
letters of the list and it will be impossible for him to learn the list, 
On the other hand, if w is too large, say 24 hours, we expect the subjcct 
to have lost the context of the experiment. That is, if the experimenter 
said "A" yesterday, and then says "C" today, we would not be surprised 
if the subject responded, "See what? cee onan we do not expsct the subject 
to learn the list when w is too large. In between these extremes we 
expect the subject to do very well. 

We now analyze this dependence of the subject's learning ability 
on the prosentation interval we 

If wis large, say w 77 tT , then the process xC) has long ago 
decayed to zero before the next letter is presented to the subject and 


x.., becomes larges Additionally, the prediction signals fron V5 have 


us 
long since traveled to the ends of the edges fron Vas performed their 
prediction excitement of the other nodes, and decayed. As w is shortened 
we begin to arrive at the situation where the prediction signal from V5 
is excited by 


arriving at the other nodes is still large when Vo 


presentation of the Aaay letter. When w = the signal from V5 arriving 
at the other nodes exactly correlates with the X14 PLOCCSS. Making 
w smaller yet, such that w << % , moans that many nodes are large when 
the prediction signals from any one of the excited nodes arrives at any 
other. 

It seems likely that the asic learning ability is dependent 
on the corrclation between his prediction signal arriving at the Vs0q 


node from the we node and excitement of the Xs44 process by presentation 
/2. s 





of the Asay letter, Assuming that this is the key to the subject's 
learning ebility, we may write down some properties for his learning 
mechanisas | 

(g) .If in one presentation of the list cochsAsigees the prediction 


signal arriving ot the V,,, node from the V5 node is large et the same 

time that the L424 Process is large, then on subsequent predictions of 

the list a large prediction signal is delivered to View from the Vs nods. 
(nh) If condition (e) is not mot, then on subsequent predictions, 

a snail prediction signal is delivered to the Vong node, 

In condition (g) and (h) we have gotten in sone geometrical difficulty. 
Previously we had decided that the prediction signal traveling along an 
edge iG is the X, process from the a node suitably time delayed to 
account for the finite transmission velocity. If this signal is allowed 
to arrive at the Ue node unchanged it will always be lerge r+ tino units 
after excitement of V,. Yet in condition (¢) and (h) we have described 
a process which determines the amplitude of the prediction signal boing 
delivered to V. based on the past correlations between the prediction 


J 


signal and the x. process, The difficulty is that we must now require 


J 
Vs to perform two functions: That of kesping track of recent presen- 
tations to, or predictions by, tho subject of the A. letter via the Xs 
process; and that of determining how vigorously the subject should pre-e 
dict the As letter based on past experience, . The second of these 
functions was placed at Ve because it requires both the prediction 
signal soar - v ) and the 7 process es simultaneously available for 
correlation, 


Reference to figure 1.2.1 shows that besides V4, the other place 


where x4 (t - t ) and %4 are simultaneously available for correlation is 
iB 








the arrownead of the ©5 directed edge, In order to maintain one function 
per clement of figure Le2elo, wea shall locate a process, EL in the 
arrowheads of the directed edges with properties (g) and (h). This 
simplifies things considerably, because we can make this process an 
amplifier of prediction signals with the further properties: 
(3) When ass is large, a large prediction signal is delivered to V, 
fron Vee 
(j) When z. ‘4 is smoll, a swall prediction signal is delivered to V. 
from Vee 

A modification of eqn (14) is now in order to account for conditions 
(i) and (4) above: 

ean (2) K5(t) = -axs(t) + Pi(t) + diag peg lt - v ) 

Considering conditions (e), (f), (g), and (h) we may formulate an 
equation for 244 as a function of time, 

Condition (e) implies thet before the experinent begins, 5 s(t) ~ 0, 
That is, the initial conditions on the 244 ares 

24 5(0) ~ 0 

Conditions (f) and (g) imply that 24 a(t) gets large only when the 
predicting signal x,(t - tT ) and process x(t) are large at the camo tine, 
and that By s(t) remains large for a long time afterward, That ‘is: 

25 s(t) ~x,(t - )xs(%) 

Condition (h) implies that when x,(t - t ) and x(t) are not large 
at the same time, then 24 3(t) decays toward sero. That is: 

yjtd~ “uzs 5(t) 
ail the above results, we have: 
eqn (3) os 36) c= zs g(t) = x, CM oY )xg(%) } 


With initial condition Bs 4{ 0) ~ 0, 
14 





Equaticas (2) and (3) are sufficient to deseribe the subject's 
learning process and its dependence on the experinenter's presen 
interval we. Jf the experinenter presents the letters of the list with 
a tine interval between each letter of Beane ane time units, then 
when the Asay letter is presented, tho prediction signel from the V. 


J 


node has arrived at the arrowhena of the e. node and the preduct 
4 


Jad 

Xsa4 (bx s(t ~%) is large. From eqn (3) Bs 514 (%) grows. On subse- 

quent repetitions of the list the sane conditions are mot and Zs seq (t) 

grows larger yet. On the other hand, x,(t) for the nodes V,, k # j +1, 

corresponding to letters of the alphabet other than Aj.4 are small when 

Mews is presented and from eqn (3), za Ct) decays toward zero for k # jt. 

When the subject is asked to recall the list, he uses his prediction 

process, starting at the first lettor Ay and sequontially excites each 

of the nodes correspondine to letters in the list in their correct 

order by following the path of large B44°8 until. the end of the list ~ 

is reached. To prevent saddling ourselves with a cunbersone output 

mechanism, we assume that the exporimenter ean read the auplitudes of 

the x5(¢) processes and considers a large x(t) 25 a response by the subject. 
One can easily see that when w >? vt , none of the preducts ... 

x(t - % )x,(t) are large end the subject learns nothing. On the other 

hand when w ¢<cyv , many nodes, Vier are excited before the meateenee 

Signals from the node associated with the first letter in the list 

arrive av iichp Coeeeerena ie arrowne2dse Thus the associated Bait) *s 

grow large. This situation continues as the prediction signals from 

the subsequent letters of the list arrive at their arrowheads, Called 

upon to repeat the list, the subject's prediction process will equally 


excite many nodes at the sane time, To the subject, it will appear 
LX 








that every lctter of this list succeeds every other letter, Although 
he has liuited his guesses to the letters in the list, the subject is 
no better off than he wes at the beginning of thse experiment in being 


able to repsat the list. 


/6 





section 1.3 Generalized Embedding Fields 


The enbedding field-notwork derived in section 1.2 to model leam- 
ing of an alphabetic list is a specialized example of embedding field 
networks, This particular network was derived because it illustrates 
vividly the major ideas behind embedding field theory and its derivation 
depends only upon intuitive ideas pbou een It is not the only 
embedding field network which can learn time sequential lists and it 
may not be the best network for this purpose, The alert reader may 
have noticed that it can not repeat a list in which a letter is repeated. 
In addition to being dependent on the experinenter's presentation inter- 
val w, its performance is highly dependent on the time delay ? and the 
parameters & and u in eqns (2) end (3). It has other problems, but 
remarkably, Grossberg has shown that these problems are qualitatively 
similar to problems experienced by hurian subjects trying to learn an _ 
alphabetic list. (The interested reader is refered to references i 
and 3 for a detailed analysis of notworks similar to that derived in 
section 1.2.) 

However, the network of section 1.2 contains most of the elements 
of embedding field theory and we shall pause here to list then, Figure 
1.3.1 shows the pictoral represeritation of these elenents, 

(1) A node V, reprosenting an elemental event which the network 
is capable of recognizing and responding con 

(2) A directed edge or allowing tranxmission of signals at a 
finite velocity in one direction from nodo Ve to node ae Pictorally, 

a directed edge is drow as an arrow shaft with the arrowhead indicating 


the transmission dirsection, 


[7 








Or; NE 
Vega cine. Wi 


Figure. 1.3.1. Elements of an embedding field network, 
The process xaty) is located at the us node, 
a 


The process x Ct) is located at the Whe nod. 


~ 


The process a Pa is located at the arrow head N., 
ale) 


The re ae Signal x,t -~v) is arriving at the arrow head Nay. 


18 





(3) Arrowheads ae representing the termination of directed 
edge ie on the node V se Because the Cee edges transmit signals 
without effecting then, At will not be necessary to reference signals 
traveling along a directed-edge until they reach the arrowhead. In 
all subseauent eauations in this paper, signals wnich have been trans- 
mitted along a directed edze will be identified by the effect of the 
transmission delay on them, icoe x,(t - 7 ). 

(4) Input pulses P(t) to node V, indicating the occurance of the 
elemental event represented by V5 in the ereomene external to the 
network, Input pulses will always be non negative and identically 
zero except in a small time interval around the occurance of event i, 
It is asswaed throughout this paper that P,(t) is imaediately available 
at ee whenever event i occurs, Because embedding field theory does not 
deal with the input apparatus necessary to deliver inputs to nodes, 
no geometric symbol has been developed for this purpose, oe 

(5) A process x4(t) Located at v, with the general formulation: 

eet x(t) = ea(t) + > byt ~ v) + Px(t) 

The amplitude of x(t) indicates whether the event represented by 
Vs has recently becn observed or predicted by the network, 

The term a(t) is designed such that x(t) always returns to some 
ambient steve indicative of no recent occurance or prediction of event 
Je 

‘The tom b(t ~ T ) 4s the effect of prediction signals on Vs. 


The summation is taken over every arrowhead N,. impinging on Vas 


ij 
b, - Y ) is the modified prediction signal received by: V, from the 


arrowheads 7 impinging on ite 
J 


/7 





We will mast frequently use the following formulations for these 
functions: 

a(t) = ax. (t) 

DG J Pa, Ct xy -~T ) 

With these formulations, equation 1.3.1 is: 

Ky(t) = -ors(t) +A % ay (tage -e ) + P(t) 

(6) <A prediction signal modification process 2, ,(t) located in 
the N54 arrowhead with the general formation: 

16362 8, (4) = alt) + fGg(t - vy x,(t)) 

The a, s(t)"s are the memory of the network. In general 2s s(t) will 
correlate prediction signals signals x,(t - T ) with the process x,(t) 
via function f, and deliver a suitably modified prediction signal 

b, (t E44). to V5s The amplitude of as Ct) is the network's memory of 
how well x(t - Y ) and x(t) have corrolated in the past. Tho term 
u(t) is the network's "forgetfulness", We will most frequently use ~ 
the following formulations for these functions: 

u(t) = “us (t) 

£Cxi Ct -t 5 x(t) = VEG eae )x Ce) 
With these formulations, equation 1.3.2 is: 

Zz, (t) = “uz, s(t) + vx, (t) x(t -~T ) 

Combining the geometric elements of figure 1.3.1 in various ways 
and suitably defining the terms of eqns 1.3.1 and 1.3.2, Grossberg 
has developsd networks en qualitatively model many general caterories 
of learning phenomena, In addition to describing learning phenomena 
on the psychological level as in soction 1.2, Grossberg has drawm ean 
analogy between enbedding field networks and nerve networks in living 


organisns which is a conerete theoretical proposal for the neurophys- 
20 





dological phsnemena underlying learning in living organisms, (See 
references 2 and 4.) 

The power of embedding fiold theory is that it is a generalized 
theory describing learning with deteministic equations, The equations 
are simple enough to allow mathematical analyses and the establishnent 
of the conditions necessary for them to perform the tasks desired of 
theme Due to the large number of nodes and arrowheads necessary to 
model a particular learning phencmena, exact analytic descriptions 
of their performance are difficult. However, the basic simplicity of 
the equations makes the simulation of their performance straight- 


forward on @ high speed computer. 


el 


se? 





CHAPTER 2 THE OUISTAR AND THE OULSTAR AVALANCHE EMBEDDING FIELD 
NETWORKS 


section 24 Description of the Networks 


The exibedding ficld network of section 1.2 was derived to illustrate 
the concepts of embedding field theory. Combining the elements of his 
theory in another way, Grossberg has proposed tivo very interesting net= 
works which this paper will study. The outstar network, and a combin- 
ation of outstars. called an ovtstar aeaiaHene. are networks capable 
of learning and reproducing any number of complicated space-tine 
pacterns. 

Figure 2.1.1 presents the geometric schematic for an outstar and 
the basic equations governing its performance. The N grid nodes Vue Vos 
coe represent the set of elemental events the network is capable of 
recognizing. Each of the distinct combinations of elemental events 
taken singly or several at a time is a distinct pattern, 

The command node vA represents an event which always precedes ea 
particular pattern of grid elemental events. The function of tne outstar 
is to learn to associate the occurance of the event associeted with the 
command node causally with the occurance of the grid pattern. After 
learning this "causal" association, the occurance of the coumand node 
event will result in the associated pattern occvring on the grid «- even 
though thero are no external inputs to the BN 

As an illustration, the ovwtstar may be used to model a pianist 
playing a piano from a score, Excitement of the x, process at the 
command node represents the event of reading the notes associated with 
a chord on his scoree The grid nodes represent his fingers and a largo 


ee 








EQUATIONS GOVERNING NET.ORK PERFORMANCE 
Be x(t) = ax (b) + P(t) 
ein 2 x(t) = = ax (t) + Aaey(tlx (-t) + PoC) 


peice a. (t) = =z .(t) te (tt x () 
Cn s C1 Cc 1 


Figure 2.1.1, An outstar network and the equations 


Peverning 16S performance, 


23 





STARTING NODE 


COMMAND NODE CASCADE 





eG) } GRID NODES 


BQUATIONS GOVERIEING NET\.ORK PERFORMANCE 


{ 


2G ax) + P bt) 


pee ‘ou = — Go. - ioe 
ae x3(t) ax Ct) 7 pr t) hor <a oo 


tf 
| 


Mm 
ax s(t) + as Zeq jhtx Ct -T) + BAe) 


Pe 6 x(t) 
j ie 
iat 


for deepen 


eel 7 oz p(t) 


tt 


- UZ . 
i: 


oe 


Wye ve. bee © oe Gey) 
j cl j 


Figure 2.1.2. An outster avalanche and the equations 


ee 98 
« 


poverning its perforinance, 


a 





x, (t) 1S interpreted as the ith finger being lowered to strike a piano 
key. A small x3(%) represents the jth finger being raised so as not 

to strike a key, By pee the piano playor will learn the proper 
finger positions associated with the written chord in the niusical 

score. The outstar will learn the proper finger positions by reading 
the chord on the score and having its fingers placed in the proper 
positions sufficiently often, This finger pattern will bo remembered by 
large and small 2,,(t)'s at the appropriate arrowheads impinging on 

the grid nodes, After having learned the association between the written 
chord on the score, both the pianist's and the outstar's fingers will 
autonatically assume oe proper position when the chord is read, 

Figure 2.1.2 presents the geometric schematic of an ovtstar ava- 
lanche and the basic equations governing its behavior. An outstar 
avalanche is a cascaded series of outstars, Each outstar learns ana 
is capable of reproducing the pattern on the grid approximatcly 
ine units after its comand node is excited. The command nodes are 
detexmministically cascaded. That is, excitation of the starting node 
by an input will always result in a prediction signal going to Vy» 


rot 
which will send one to ee and so on. There is no learning associated 
with this, The command node cascade is an enbedding field clock, 
Because the prediction signals travel along directed edges at constant 
velocities, excitement of the starting node results in a prediction 
signel arriving at cémmond node V,., (i - 1)? time units later, If 
a time varying pattern of elemental events is being played on the grid, 
then each command node takes a picture of that pattern when it is 


excited, Thus associating the start of a particular time varying 


pattern, say a piano sonata, with excitement of the starting node will 
Ze 


or 





result in & time sequernticl series of pictures approximating that 
pattern being learned by the network, Jf many coumand nodes ara 
cascaded in this manner end v is made sufficiently smal], the sampled 
data approximation of the pattern can be made arbitrarily close to 


the pattern, 


26 





section 2.2 Theoretical Work on Outstars and Outstar Avalanches 


| 


Grossberg has mathematically analyzed the pattern learning abilities 
of outstars and outstar avalanches extensively. (Refs 7 and 8). In 
the process of this analysis he developed particularly handy nathena- 
tical descriptions of a pattern of elemental events, the pattern learned 
by the ontstar to approximate this pattern, and the pattern reproduced 
by the outstar on its grid when predicting. the elemental event pattern. 
An elemental event pattern is defined by the values of the input 
pulses P(t) at the grid nodes. Although their amplitudes may be 
different, all input pulses have tho same shape, We can describe the 
relation of the ith input pulse to the other N - 1 inputs (consisting 
of non zero pulses Pt) indicating that event j is part of the 
pattern and zero pulses Pe) indicating that evont k is not part of 


the pattern) by forming the probability: 


bet) 
an yo when any P,(t) comprising the pattorn 
Diet.) is non zerd 
io | 
The elemental pattern can bo eccmpletely described by the N dimensional 
vector, 
-_—> 


60 = O4s Oo» eeesp © 


Note that this description of the pattern is amplitude indepen- 


n 


dent. That is, e defines the pattern whether that pattern is pre- 
sented vigorously or not, Additionally note that by the definition of 
the e; , e not only describes a pattern by the occurance or non 
eccurance of elemental events in it, but also by the relative strength 
of the occurance of those events, Jn the piano playing example, this 
corresponds 86 describing the finger positions for a chord by indicating 


a 





which fingers are raised so as not to strike keys a wnich fingers 
are lowered to strike keys, plus the velative pressure each of the lowered 
fingers is to exert on the keys. 

Since the P.(t)'s have the same shape, and differ only in amplitude, 
the Q,'s are constants during presentation of the pattern. 

In a similar manner the outstars' rasponse to presentation or 


prediction of this pattern can be described by the probability vector: 





X(+) = X, (%), X(t), iby’ X(t) 
ee 
where X(%) == = Sn 
yy Ce 
ie 


The pattern learned by the outstar to approximate this pattern can be 
described by the probability vector: . 
Y(t) = y(t), y(t), €e@o0g y(t) 
Z(t) 
z(t) 
a Ae v So 


=a 
-> . 
Now suppose that the pattern © has beon presented to the outstar 


where y,(t) z= 


on 


M times. Then Grossberg has proved that starting with arbitrary initial 


data for the x, (t)"s and z,(t)'ss 


\ 


ts 


(a) For every M = 1, the limits: 
(M) din (M) 
Q, = 1-7 0 Xe (+) 


and 
(m) Lam 
R, = %-00 Ys (t) 
exist. 
(bo) For every M =1 and for all times t after the last presenta- 
tion of the pattern, the probabilities X, (t) and v5 (t) are monotonic 


in opposite senses with|y, (t) - X, (t)| non increasing and are constant 
. 


on intervals where the prediction signal fron A 1S Zero, 


CO 


a 





(c) Lin M Lin (M) 
N~ it = MooM, =@ 
i i 4 
where 
(a UM) (M) 
m, = mindnun of X, (to) ory, (t) 
and ty is the instant the last presentation of the pattern was 
completed. 
And 7 
(i) (M) i 
M. = maxima of X, (to) or os Me ) 
1 zt 1 0 
Thas by (a) - (c), . 


im lim eS ier eyes 
M20 t20 =X, (t) =Mz7o +t-00 Ys (t) = e. 


(a) The functions y(t), i = yt” (t) = xe" (%) and gC (4) 0, 
change sign at most once and not at all if £0" (t=O) gs (t=0)2 0, 
Moreover, (t=0)e"" (t=0)> 0 implies r, (ile (tae 0 for allt 20, 

Interpreting these results, we see that (c) implies that the 
network's memoxy of the pattern and its predictions of the pattorn - 
converge to the pattern as the number of times the pattern is 
presented increases, or "practice makes perfect". (2) and (b) insures 
that the network's menory of and prediction of the pattern after the 
last presentation of the patter will get no worse than it was inmedi- 
ately after that last presentation. (d) shows that there is at most 
one oscillation in the convergence and therefore the network's learning 
ability is stable. 

An additional benefit of result (¢) is that if the network started 
associating one pattern with the command node event and it is decided 
that association is an error, then a new pattern, the correct one, 
may be Jearmed over the old one with sufficient practice, That is , 


all errors are correctable, 
24 





section 2.3 Approach to the Study 


Grossberg's theoretical results greatly enhance the attractiveness 
of outstars and avalanches as devices for modeling certain categories 
of learning phenomona, As qualitative Boden they have wide application, 
(See refs 6 and 9) However, beyond the qualitative insight that they 
provide, are they practical? The mathematics guarantee that an avalanche 
will learn a piano sonata with sufficient practice. If sufficient 
practice means forty years, we would do well to roeonete for another 
model - not vecause they do not work, but because they do not work woll 
enough. 

Thus the question "How well do they work?" is pertinent. This 
is the question that this paper addresses. It is a practical question 
and outstars and avalanches are considered'as practical devices that 
learn throughout the rest of this paper. 

In order to accomplish this study; a digital simulation of the — 
networks was programed onto a computer, The details of this simulation 
and an evaluation of its accuracy are provided in appendix A, All 
attempts were made to reduce the artificialities and errors introduced 
by this method of study, However, constraints were forced on the study 
by the digital simulation and these constraints will be noted and ex- 
plained as they occur in this paper, 

As an outstar avalanche is a cascade of outstars, the primary 
emphasis of this study is on outstars. In studying the ouvstars, 
attention is devoted to the possible interactions of one outstar in an 
- avalanche with another. Where avalanches are presented, they are more 


or less used as tests to confinn the conclusions established while 


40 


ee” 





studying the outstars composing then. 





CHAPTER 3 THE SIMPLE OUTSTAR 


section 3-1 Specificetion of Parameters for the Study 


The geometric schematic and equations in firure 2.1.41 describe 
the simplest outstar, The equations are repeated here for easy 


reference: 


ti 


Bed x(t) 


Bole2 = - x (4) = wax, (t) + P(t) +8 2,,C¢)x (t ~ v ) 


-ox,(t) + P(t) 


B.S Zoq4(t) = -uz,.(t) + vxy(t)x (t -T ) 
In order to study this outstar, we must assign numbers to the constants 
Kes Up, Ve and? 3 initial conditions must be assizned to the variables 


Poe.» ANG Z 


er % eit 3 shape and amplitude for the inputs Po and P. nust be 


selected; and the numbers of pattern nodes, N, must bo specified, 
Additionally, the test pattern to be taught to the outstar must be 
decided upone | ue 

A great deal of experimental time can be saved if these paremeters 
are specified in a somewhat retional way. A’rationale can be déveloped 
for any method of specifying tho paraneters, so we shall arbitrarily 
begin with the inputs, 

Firstly, the inputs are only used to indicato the occurance of 
eieenta? events external to the outstar, All we require of them ‘is 
that they be non negative in an interval around the occuranco of the 
elemental event and zero at all other times, Also, we would like then 
to reflect the strength of presentation of the. events they represent, 
For a first try we will make them identical in shape, duration and 


amplitude for both the grid inputs P(t) and the comand input P te). 


ae 





An impulse might be good shape for them, but there might be effects 
associated with duration that would be interesting to sce, On the 
other harid, if we want to analytically check our results, then we want 
the inpyts' shaps to be simple enovgh to make the analysis tractable, 
A rectangular pulse of amplitude A and duration § is suitable. Note 
that with this selection for inputs we have inplied that our input 
apparatus is a digital sampling device which sanples the continuous 
variation of events in the external environment at tine tos sets the 
inputs to nodes corresponding to events present in the cnvironment 
at t to value A, and holds these values until the next sample is taken 
at tine bo + $6. If we recall that an avalanche porforms a similar 
digital approximation to time varying events, this selection for inputs 
is not too bad, 
As the direct response to the inputs i linear, wo may leave the 

amplitude, A, of the input pulses arbitrary. In selection of the duration 

§ , we run into a compromise with the digital simulation. An accurate 
simulation of the response to a long duration pulse requires considerable 
conputation tine, Thus to minimize computation time,  shovld be short. 
Yet the pulses were given a finite duration to study possible offects 
of duration, We do not want § to be eee short, Wath this trade off ‘iin 
mind, a good selection for §& would be the shortest rise tine in the 
outstar, The rise times of the outstar are ifa for the x processes 
at the nodes, and i/u for the 2 processes at the arrowheads, wu is the 
"forgetting rate" of the outstar and it would be expected that the 
forgetting rate of the ovtstar should be slowor than the rosponse 
rate, Q, of the x processes. Therefore it is reasonable that m should 


be greater than ue. This implies that i/o is the shortest rise time 
ae 





in the ovtstar and we shall. set 6=i/a. 

The x processes at the command nede and the grid nodes indicate | 
the recent presentation to or prediction by the outsiar of events, At 
ths beginning of a learning oxperiment it is roasonable to assuno that 
there has been no recent presentations cr predictions of the events to 
be learned, The initisl conditions for the x processes can bs assumed 
Zero, ives x,(0) = 0 = x, (0) foxueatlese 

The response time «® of the x processes has already been specified 
as 4=1/§ ,. Thus all the parameters for the command nodes x, 
process have been spedified. For the grid nodes v ,  » and tho initial 
conditions on the z's still must be specified. To save conputation 
time, t should be small. As thore is no feedback from the grid to the 
command node, there is no necessity for tT to be non zero in this simple 
outstar,. Ina digital simulation, however, the accuracy is improved 
if there is a time delay between simultaneous processes and making 

* > Ois advantageous, A suitable selection for T isT = é& , 


From equation 3.4.2 it can be soen that f# and (t) determine 


S64 
the amplitude of the prediction signal being admitted to grid node 

Vie As the cutstar's memory is the Za4(t)'s, it is the most important 
factor in this prediction signal emplitude determination, Setting 

B= 4 will make analyzing the er peetce the 2's on the prediction 
Signals easier. 

‘The parameters associated with the 2 processes, u, Vv, and initial 
conditions Z 60) must be specified. wu is the "forgotting rate" of 
the outstar, As we want the ovtstar to reuember what it has learned, 
we want u to:be small, Renenbering that computation time is scarca, 


a small. u for this expsriment is anything such that the decay timo 
St 





i/u of the 2 processes is several tines longer than the length of the 
experiment, 
Selecting v is a problen, As can be seon from equation 3.1.3, 


ind émplitude of the 2 process given an 


99 


v determines the riso rate 
x, (t) response and the prediction signal x(t -t ). In presenting 
a pattern to the outstar to be learned, the best learming should occur 
when the inputs to the grid nodes are presented at the same time as 
the prediction signals from the command node arrives at the arrowneads. 
The problem is that in this situation, how well shovld the outstar learn 
the pattern on the first presentation? To ree this question, we 
need some way of measuring how well the outstar has learned a pattern 
after presentation. 

A tentative operational measurement would be to say that the outstar 
has learmed a pattern well when the prediction process drives the amp- 


— — -- 


litudes of the grid node x processes to at least the sane values as 


~ 
. 


they are driven to by the event inputs, Using this measurement we 
ean specify v's which result in well learning ‘in one presentation or 
two prescntatiions and So one 

However, this doos not end the problem associated with “rationally” 
selecting an initial v for an experiment. Suppose ne specify a v 
which results in well learning in one presentation, What valve should 
this v have? A rational selection of an initial v requires solving 
the outstar eatin The reason why the ovtstar is being simulated 
is the difficulty of analytically solving these equations. To avoid 
these difficulties, the procedure taken in this study was to specify 
all other paranoters in the outstar including the numbers of presonta- 


tions required for well learning, A guess is then made for a v and an 


a2 








experiment is performed to soe what amplitude the prediction process will 
drive the grid nodes to after one pattern presentation. The guessed 

v is then appropriately sea to result in the specified well learning 
CriteriA. 

For the current experiment, v was selected to result in woll 
learning in two pattern presentations, 

Concerning the initial conditions for the z processes, we expect 
on the first presentation of the patterm that the network has not 
previously earned anything about the pattern, That is z 460) = 0 
for all i. However, we wo uld like to see what happens if one of the 
Zo, 8 is not zero at oe beginning of the experinent. Thorefore we 
will make one of the 2460) non zero, but small, 

Only the number, N, of grid nodes and the test pattern to be 
taught the outstar remain to be specified. As we are only perforning 
this experiment as an initial look at en outstar, a good test pattern 
would be presentations of one event which the outstar should learn 
to associate with the coumand event. An additional event presented 
at a time well removed from arrival of prediction signals from the 
command node would be 2 good way to test interferonce between ouvtstars 
in an avalanche, As v was selected to result in well learning in two 
presentations, this test pattern will be presented twice and then a 
prediction will be called for to see how well the pattern has been 
learned. 

This gives us two grid nodes, A third grid ncede is included to 
study the effects of the non zero initial conditioned 2 processes, 


No inputs will be given to this grid node. 


46 





We need now to only assign numbers to the parameters in accordance 
with the above specifications: 
Geometric parameters: 


N = monber of grid nodes = 3 


$ 


tT = time delay of prediction signal = 0.3 sete 


! 


Input parameters: 

Input pulse shape is rectangular 
A = input pulse amplitude = 10 

§ 


i 


input pulse duration = 0.3 sec. 

Input pulses will be delivered to the command node, Wa at times: 
0.1 sec., 1.9sec., and 3.7 sec. 

No input pulses will be delivered to grid node Vy 

Input pulses will be delivered to grid node Vo at times: 0.4 sece, 
and 2062 SCCe } 

input pulses will be delivered to grid node lie at times: 1.0 SOC. 
and 2.8 sete 

Network parameters} 


: 


& = time constant of x process = 3.3333 sec. 


(8 = prediction signal amplification constant = 1,0 
u = "forgetting rate" = 0,01 secs! 
v = correlation amplification constant = 1.6 (satisfies well 


learning in two presentations criteria) 
Initial conditions: 


pee) = x (0) = 0 for alt a 
Cc ‘i 


it 


Z 460) Oa 


Z 0960) = Z_3(0) 1 (0 


a 





The above lengthy description of the reasons for seloction cf tho 
arameters for the exosriment to bo presented in the next section wes 
a 


provided as an illustration of the decisions that must be made when 


ae 


performing the experiments ‘in this study. Except where noted, in the 
future the sane reasoning will underlie the selection of parameters 


for experiments, 


38 





section 3.2 Experiment I - A Look at a Simple Outstar 


Figure 362e1 shows the Pees of the experiment outlined in 
section 3.14 The inputs to the nodes are plotted on the same trace as 
the x process response of the nodes, 

A striking feature of figure 3.2.1 is that the x process node res- 
ponses all have amplitudes of significantly less than the amplitudes 
of the input pulses, It can be seen that this is as it should be 
if we consider tho equation governing the response of a node to an 
input only: 

x, (t) = ~ax,(t) 7 P(t) 

The solution of this equation for 2 rectangular input pulse of 
amplitude A and duration § is: 

(A/a) (4 anoe ) for 02%45 
a Vale) a o% yon*® stor b> en 
Tho maxinun of this response occurs at t = §& . For the piranetors 
specified for this experiment, the maximun amplitude of an x, (t) response 
to an input pulse only is: 

max x,(t) = 1.9 
which is about 20% of the amplitude of the input pulses. 

The pattern we intended to teach to the outstar was to associate 
the Peetrence of the command evont with event 2. The outstar was in- 
structed in this pattern twice by presenting the command event to it 
and then presenting event 2 to it tT time units later. This can be 
seen from the command input trace and grid node V,'s input trace. 
After the instruction was over, the command event above was presented 


to see if the outstar had learned the pattern, As can be seen fron 


a 





Figure 3.2.1. The results of experiment I - an initial look 


at a simple outstar. 


lO pH) 
| x(t) 
lt 
lO | 
| x,t) 
w 
p_ (t) 
lO C | x(t) 


© alee =o OG 
ive Ses) 
Vo 
a ey 


LO 





ee third response, the outstar did predict event 2 and we can consider 
that it has learned the pattern. 

This experiment was also designed to eee wnat effect a small non 
zero initiol condition on a z would have, Thus Z4 Was given the initial 


value of 0,1 while 2, and 2,3 Were given zero initial values. As 


C2 
can be seen from the x, response rezce, the sm2l11 non zero initial 
value for z 01 6) had no perceptible effect. x,t) did respond to 
the predicticn signals, but the response was so smali that it doos 
not show on the scale shosen for figure 3 e Oiliy 

We gave the input pulses a finite duration to see if there would 
be any effects associated with this duration. Such a duration effect 
is the fact that the x responses reach a maximum at the end of the input 
pulses and then decay exponenticlly away from this maxinua, This effect 
is entirely due to the shape selected for the input pulses and the 
exponential response of the x processes, If we accept the sampled -- 
data input epraratus described in section 301 as the input apparatus 
for the outstear, then this effect has important consequences, It says 
that the outstar’s response to a sample taken at tine vy extends, with 
large emplitude, into the next sampling period starting at t, + § 
and beyond. In this experiment, we selected the inputs to Vy to occur 
26 after the inputs to Vz- As explained above, the inputs to V, 
were solected to result in maximum learning.. From the trace for x4 (4) 
it can be seon that event 3 was also learned to be associated with the 
command event, although to a much lesser extent. This resulted from the 
"tail" of the prediction signal still boing reasonably large when event 
3 occured. The product x(t Jae Ct ~t ) was therefore sufficient to 


cause Z 3 to grow as can be seen from 2 (%)'s trace, Thus when the 
c eC 
él. 





outstar was tested to see what it had learned, it predicted event 3 
as well as event Ze | 

Thus the "tail" duration effect will result in the outstar learning 
not only what happens in the sample in which prediciton signals arrive 
from the cemmand node, but also in the sample taken after that. By 
symmetry, it will learn the samples taken before in the same way. 
We will mark this effect for further study. 

Another effect to note in figure 3.204 is that Zoot) erew with 
each presentation of the pattern and on the recall test. Because u 
was chosen small, Zot) did not decrease and essentially acted as 
an integrator of vey (t)x (tt -~ ©), The effect of the growing ae) 


can be seen in the trace for x(t) where the x. response increases ‘in 


2 
amplitude on each presentation or prediction, If this growth continues, 


we could expect x, responses to get impractically large, Experiment I 


Z 
was continued and the x, (t) responses did continue their growth, 
Figure 302.2 shows this continuation and it can be scen from the trace 
for x(t) that the x, responses continued to grow on predictions only. 
Not only are the xX, responses growing with each prediction, but a quick 
look at the Bot) trace will show that they are growing at an increasing 
rate, 

Experiment I was continued not only to study the growth. of Xo» 
responses but also to test tho theoretical prediction that outstars 
are capable of correcting all mistakes. An attempt was made to correct 
two types of mistakes in the continuations. It was decided to consider 


the already learned associations between the comand event and event 2 


as a mistake and that the correct association should be with event 3, 


HW? 








P. (t) 


Figures. 3.2.2. iO +5 Fe a) 
Continuation of experiment I, 


ater 





AO, 
Zit) 
lO 
Eee 
Oe 2 sare 


ye TIME (SECS) 


Therefore, event 3 was presented t time wnits after presentation of 
the command event three times, Event 2 was not presented at all, 

The second type of mistake was simulation of a "random" mistake 
by presenting event i once T tine units efter presentation of the 
command event. 

The results are interesting. Due to their growth, - responses 
continucd to be greater than a and Xe, responses. The Xo responses were 
catching up with the x, responses, but from ee and Z oft) 
traces, it can be seen that it will require many presentations of the 
, responses will reach a point where we 


d 
could say that the Veo mistake is corrected, 


aa V., association before x 


From x, 'S trace it can be seen that the “random mistake was roe 
membered by the outstar, It was also predicted with increasing ampli- 
tude on subsequent predictions. However, with the results of expsriment. I 
plotted as they are in figure 3.2.2 it is difficult to sec if any _. 
mistakes were corrected. Tho theoretical prediction that all mistakes 
could be corrected ‘involved the convergence of the probabilities X4(t) 
and y,(t) to O 56 Translating the data from figure 3.2.2 to these 
probabilities, we have the following results: 

Tabke Jecel 

Translation of data from figure 3.22 to probabilities suitable 

for comparison to theoretical prediciton that an outstar can correct 


all mistakes, 


tel 








Table 3eZel 














Response number, 0 a e 3 
0, 0 0.5 0 0 
7 Xy 0 0.188 0,083 0,097 
V4 5 (7 0,083 | OP ool 0,101 
0, 1.0 0 | 0 0 
V, Xp 0,892 0.563 0.0525 0,612 
Yo 0,886 0.75 0,682 0,632 
Gx 0 0.5 0 0 
Vs X, ~ 0,107 0.249 0.292 0.319 


0.407 06467 06227 0265 


‘The M = 0 response column is the results from the last response 
in figure 3.2.4 and is the initisl data thet tho continuation of experiment 
I bogan with, The M = 1 yvesponse colvwinn begins the attconpt to correct 
the mistake Vara to Von and includes the “random” mistake of 
presenting event 1, The M = 2 and M = 3 response colwins are the con- 


tinuing effort to correct eerie to V ae! without "random" mistakes, 


3 


Except when the random mistake occured, X, and 4 remain snall 


j . 
and about the same magnitude as the duration effect rervor" of event 3 
in the first part of experiment I, We conclude that a "random" mistake | 
affects the memory of the outstar to a small extent, 
Table 3.204 does show that the Pea mistake is beine corrected 
to cank aS x, and Yo are decreasing while Xe 


However, from the numbers we can conclude that it will require many 


and ¥3 are increasing. 


presentations of the oes pattern before the magnitudes of X, and 


3 
¥3 exceed Xo and Yo and many riore presentations of Nios before 
4.5 





a 





X. and bear the same relation to X 
5: 


2 


and y 


5 and Yo aS Xo and Yo had to ss 


3 an the M = 0 response. In the meantime, it could be oxpected 
that the x response will have become unrealistically largo, 

The uncontrollable growth of the x responses makes this outstar 
an unattractive device, Although it conforms to the theoretical 
predictions, the actual means by which we measure its performance is 
the x response and not the X probabilities, The growing x responses 
means that in our piano playing example, this outstar will be punene 
holes through the keyboard of the piano with its fingers when it plays 
a frequently used chord. Thus, to make this a useful device, we must 
find some means of limiting the x responses at a practical amplitude, 
As we pointed out, the growth of the x responses was due to the growth 
of the Zq4 process which determines the amplitude of prediction responses. 
We had chosen the "forgetting rate" u of the 2.5 processes to be small. 
At the sene time we did so, it scemed reasonable to have the outstar 
forget slowly. However, non decaying ZQ4 Processes have lead us to an 
undesirable situation. We will therefore try to control the ance om 


of the x response by increasing the “forgetting rate". 


ae 








section 3.3 A Simple Outstar with a "Fast" Forgetting Rate 


tt Slow 


The forgetting rate of experinent I was selected to be 
relative to the tine scale of experiment I, In experiment I, the 
characteristic decay time, i/u, for the B44 process was 400 seconds 
which was long compared to the 11 seconds total length of the experiment, 
In that 1i seconds, the nctwork was asked to learn one pattern and then 
to correct it. The time between presentation and/or predictions Was 
1.8 seconds, Thus, when we speak of a "Past" forgetting rate, we must 
decide “fast relative to what?", © 

To conserve computation time, we shall make tho forgotting rate 
fast relative to the nreseeeen and/or prediction time interval, 

Tee 4/u = 1,8 seconds, or u = 0,556 sec.» This leads vs into another 
problem. The v of experiment I was selected on the "two presentations 
mean well. learning" criteria, That is, the By process would get large 
enough ‘iin two presentations of the pattern so that a prediction follow- 
aie these presentations would drive the amplitudes of the x processes 

to the same values as the input pulses alone would drive them, If 

we expect the network to forget in time comparable to the presentation 
interval, it would be better to change v such that it conformed to a 
“one presentation means well Learning" criteria, We will therefore 
double v to v = 3.2. | 

To compare the fast forgetting rate outstar to the slow forgetting 
rate outstar of experiment I, we shall re-porform the first part of 
experiment I with all other parameters specified as they are in section 
3.4. This experinent will be CHMNBe cx periient A, 


Ht / 








Figure 3.3.1. The results of experiment I] ~ a simple outstar 


with a fast forgevling rate. 





iO pt) | 
eo ae 
5 2 
lO p(t) 
o x ala) 
a 
; gee a 
Zets) 


lor 
en oe eee tl 
Cm ea GS CG 
+ TIME (SECS) 








Figure 3-3.1 shows the results of this experi ment. Because 
the responses of this experiment wore smaller than in experiment I, 
the verties1 scale for the x traces was doubled in figure 3eBele 

As can be secon, we have managed to reasonably control the 
amplitudes of the x responses by allowing the %'s to decay between 
excitencnts of the command node, At least the 2's do not exhibit the 
monotonic growth they did in experiment I, The intended association 
a 3 was learned well, Again thers is some learning of Nie Ve 
due to the "tails" of the prediction signal, The non zero initial 
condition on Za4 produced no perceptible effect. Jt can be concluded 
that the ovtstar porforns very well over short periods of time. However, 
with its memory decaying rapidly, how long will its momory persist? 

This question hits upon one of the key features of an outstar, 
The mathematical theorem concerning outstars states that the outstar's 
memory of a pattern remains wnimpaired for all time after the last 
presentation of the pattern, provided no new or random pattern is pre- 
sented to it subsequently. Of course, in the language of the theorem, 
this meant that the y,'s would not change even though the Za 8 Wore 
decaying exponentially, Jt looks like a fast forgetting outstar has 
the opposite problem from the slowly forgetting one. That is, the 
responses, while retaining the proper Xe probabilities to define the 
pattem, aro so minute that thoy are meaningless measured against a 
practical scale, However, the third response on the Zot) and Zot) 
traces shows that a prediction will cause the z processes to grows 

Now, suppose that the 2 processes havo all decayed to the point 
where a prediction by the outstar results in meaninglessly small grid 


xX responses, Then if enough predictions are made rapidly enough, we 
fe 








—S- 


LO 
Ge 
Ca 
5 
Os 
5 
lO — 
5 


“oful SULJesIOyT YSUJ e UTM Teys ZNO 
eTduts e jo Atowwu ayy ,dn gupdumd,, Jo uotTyeajsuowep y 


“II yUeuTuodye Jupnurzuod Jo s y[Nsed sul “°Z°e*e omsty 


a WAY 
>< a — 
Se 2 
M ge = 
c< fx ° 
} NJ 


Z_o(t) 


2 


Ee a 


lO — 
LO 
IO — 
O 


5 


C 


5O 


TIME (sec) 








can "pup up” the a's to the point where tho x responses ars large 

enouch to mean somothing. Grossborg's theorem insures that the ampli- 
tude of the x process will. remain in the proper ratios to one another, 
Exporiment I] was continued to demonstrate this memory “pumping up" 

end the results are shown in figure 3.3.2. As can be seen, the outstar's 
menory was allowed to decay for awhile and then the conmand event was 
presented to the ovutstar three times in rapid succession. "Pumping 

up” occured 2s expected, 

A psychological interpretation of memory pumping up would not be 
tenuous. It is en every day occurance to have a piece of previously 
learned information, a name say, on the "tip of one's tongue”, but 
not ba able to recallit until all tho associations connocted to it 
have been recalled, If we consider the name to be inseribed on the 

grid on an ovtstar, then recalling things associated with the name 
would be equivalent to rapid excitcments of the command node, After 
enough such excitements, the name would appear to "pop into one's 

head", The mnemoxvy of the name would then be "fresh" for sometime after 
being resurrected before it again faded into the "preconscious",. 

We will ‘introduce a modification in section 3,5 which will make the idea 
of a faded nemory “popping” into the outstar's "head" more precise. 

Of course; a presentation of the pattern after the punee nae Z 


processes have decayed to small values will also refresh its memory. 


3) 








section 3.4 Resistance to Random Mistakes vs, Correction of 
Learned Mistakes: <A Philosophy for Learning in 


Outstars 


Experiment II was continued to investigate the effects of a 
simulated random mistake on a simple outstar with a fast forgetting 
rate, The results are shorm in figure 3.4.1. ‘Event 1 was presented 
at the same time as event 2 to simulate the occurance of a random 
mistake in the pattern. As can be seen from the x, and Zo4 traces, 
the random mistake complotely confused the outstar, Whereas the ovtstar 
had previously learned the association aes occurance of the rancon 
mistake resulted in the outstar remembering aa and to only a 
slightly lesser extent, ae va The amplitude of the second and 
third x5 prediction responses in figure 3.44.1 are significant enough 
to conclude that the random event resulted in confusion. The memory _ 
of a simplo outstar with a fast forgetting rate has very little 
resistance to random mistakes, 

To understand the significance of this outstar's low resistance 
to random mistakes, we must develop an understanding of the outstar's 
relationship to its external environrent, Up to now, we have just 
been concerned with the interval workings of the outstar, Now con- 
Sider that the ovtstar is a machine which includes the outstar network 
previously deseribed plus an input apparatus, This machine "Lives" 
in an environment in which events occur, The input apparatus filters 
the events occuring in the environment and delivers an input pulse 


to the appropriate node in the ovtstar when one of the events tho 


outstar is capable of recognizing oecurs. The outstar is capable 


rw 


Bi 








Figure 3.4.1, Continuation of vexperimentel Grometouie se moe. 
P,(t) Simulates a random mistake in the pattern previously 


taught to the outstar, 


1O 








p(t) 
5 ; x(t) 


LO p., (1) | 
5 - | x (1) 
| lq — ae 


e f 2, (t) 





Qn 2 6S eS ae 


a? TIME (SECS) 








of learning the association between the comnand event and any events 
which are representel] by grid nodes if they occur seca ecin oe 
time units after occurance of the command event. 

In order for the ouvutstar's learning ability to conform with in- 
tuitive notions about learning, we would want itv to learn that the command | 
event is associated with a particular pattern if and only if the occur- 
ance of the command event in the environment is usually followed by 
the oceurance of the pattern, Suppose the outstar observed one bowl- 
ing ball]. colliding with another with the result that the first ball 

stopped dead and the second bowling bell rolled eway from the collision 
point with the sane velocity that the first bowling ball had before 

the collision, After the first observation of this event, we would 
expect the intelligent outstar to suspect that it had observed a law 

of nature thet applied to all bowling ball collisions, We would expect 
the outstar to go from a state of ignorance about the conservation of 
Penentun to an intuitive understanding of it. Philosophically, we desire 


the outstar to be an inductive learning machine, 


Jf we described this situation statistically, we may assign 
probabilities to the occurance of events in the cnviroment., At any 
given time, t, we may describe the likelihood of the occurance of an 
event associated with the vy node in the outstar by the prOoe panes 
PRee Additionally, we can deseribe the relationship between tho oc- 
eurance of evonts with the conditional probability oe which is the 
probability of the occurance of event j given that event k occured 
recently. In the cutstar we are particularly concerned with the prob- 
abilities PRs fo where ec is the command event and the i are the grid 


events, To make the outstar an inductive learning machine, we want 
eas 








meeto learn. VY -=V, af and only afare is large. If PR.,;. is small 
C u ife j/e 

we would want the outstar definitely not to learn V gg) 

On the first occurance of a pattern following the command event 
by approximately tT time waits, the outstar can heve no idea of how 
large ye is, Therefore we would want it to only suspect that the 
comand event usually preceeds this pattern, However, if the next 
time the command event occurs, it is followed by the pattor, then 
there is good evidence that Sc 45 large and the outstar should 
draw this conclusion. Now, in the real world, wo expect background 
noise, That is, if event j does not usually follow the occurance of 
event c, there is nevertheless a small probability that it will occur 
as a randon mistake sometime. In order to protect the outstar's 
memory, we would want it to be resistant to drawing spurious conclu- 
Sions about the association of the command event with randomly occuring 
mistakes, If the outstar observed the collision of bowling balls =~ 
in which cne of the balls was shattered into many pieces, we would not 
want this random occurance to dostroy its confidence in the conser- 
vation of niomentun. 

The memory of a pattern in an outstar is contained in the 
provabilities: 

t) = a(t) [Se Ce] 

y,(t Ba 21865 
The equation describing the z's is: 

Bgy(t) = ong (t) + vay(tla(t =o) 


In the case where u is vory small, this is equivalont to: 


ec 
feg(t) =v f xj(B x(k ~ t dé 
© 


° 


5) 





OO 


Define I = Vv \ x5 (6 )x (6 - )d& 
eel Cc 

where x, (t) is the response of a grid node to one input pulse in the 
infinite time period, and x(t -T ) is the prediction signal from 
the commend node tT time units before the grid event. Thus, if in all 
time predeeding +, the command event has been presented to the outstar 
M times, 

eee) M(PR, 7,)4 
menus, if TE is large corresponding to a causal association between 
event ¢ and i in the environment, Zz 46) will be large. On the other 
hand, a small PRa fo corresponding to event j occuring randomly and not 
causally associated with event e¢ in the een pao will be 
smajJl,. Thus the 2's can be considorad Pandan variables faithfully 
reflecting the a priori conditional probabilities in the environment, 
Note that this reflection of the statistical deseription of the 
MEN) 4s contained in tho auplitudes of he efsiand sepa a 
up by experience with M presentations of the pattern. The resistance 
of the simple outstar with a slow forgetting rate in experiment I 
to random mistakes was due to this correspondence between the amplitudes 
of the z's and the a priori probabilities in the environment. It may 
be concluded that whereas the outstar's menory of a pattern is contained 
in the y,(e)"s, its menory of its experience is conta‘ined in hemes 
tudes of tho % PLrocesseSe Thus, when its memory ot its past exporicnce 
is allowed to be forgotten at a fast rate as in experimont II, the 
occurance of a random mistake has disastrous consequences Ot 6 
memory of the pattern, 

It is not surprising that a machine which forgets its past ex- 


perience rapidly will be very susceptible to having its mind changed. 
56 








We may look at this as both a benefit and a drawback. In the slow 
forgetting outstar of experiment I, tho attempt to change its mind 
about a previously learned pattern by teaching it a new one was only 
partially successful, It required only two presentations of the 
original pattern for the outstar to learn it. However, the evidence 
of the attempt to correct this pattern indicated that many more pro- 
sentatiions of the correcting pattern would be reguired to change its 
mind, The outstar’s resistance to rardon mistakes was laudable, but 
its relative inability to change with changing times could be a serious 
drawoack in its environment. On the other hand, the fast forgetting 
outstar will have no trouble changing its mind with the tines. but 
its low resistance to random mistakes is also a serious drawback. 
Wo may summarize the above heuristic discussion of the constant 
u in an ovtsiar: 
(a) A small u implies: 
(i) past experience is slowly forgotten 
(ii) high resistnace to rangom mistakes 
(355) low correctability of proviously learned mistakes, 
(b) A large wu implies: 
(i) past experience is rapidly forgotten 
(ii) low resistance to randon mistakes 
( 


In eddition, we must consider one further cffect of the constant u 


ji) high correctability of proviously learned mistakes, 


[ts 


on the performance of an outstar: 


(c) A small u restilts in uncontroled growth of the grid x pro- 


cesses' amplitudes. 


Again it is stressed that “large” and "small" u's refer to whether 
Sar. 








the characteristic decay tine i/u is long or short relative to the 
expected timo intorval between presentations to and/or predictions 
by the outstar. 

Because of condition (c) above, a ractical simple outstar re- 
quires a large ue Thus design improvements to the fast forgetting 
outstar which results in greater resistance to random mistakes ara 
desireable. In the next several chapters we shall introduce more 
complicated outstars which oxhibit improved noise resistance 
without the x process amplitude problems of the simple outstar, 
However, for the present, we still have an ayenue open for increasing 
the simple outstar's noise resistance, 

Part of the reason for the poor noise resistance of the simple 
outstar in experiment II was due to the fact that v was selected 
by the “ons presentation means well learning" criteria, Thus presen- 
tation of a random mistake once resulted in its being woll learned. 
Had we selected a smaller v and required more presentations of the 
pattern in rapid suceession to result in well learning, then the effect 
of the random mistake would be smaller. At the same tine the 
correctability of previously learned mistakes would decrease, If we 
wish to make the noise resistance of the outstar very good by this 
method, then we must be content with an outstar of slow intelligence 
that requiros having a pattern dirwimed into its head before it learns 
it; or, we could use the pumping up phenomena of the outstar and have 
at think about a pattern prosented to it many times in rapid succession 
before it is well learned. Selection of the proper v to bo used in an 
outstar is a design decision which must take into account this trade 


Git, 
58 





section 3.5 The Occurence of a Pattern of Events over a Period 


of Time; Thresholds 


J. 


In experimcnts I and lI, the grid node events in @ pattern were 
always presented exactly ¢T time units aftor the conmand event. 

The reason for this is that it takes v time units for the x response 
to the cormand input pulse to travel along the directed edges to the 
arrowheads impinging on the grid nodes, Until the prediction signal 
x Ct ~% ) arrives at the arrowheads there .can be no correlation be- 
tween x Ct -~%v ) and the x process Pee nonenr eo a grid input pulse at 
the adjacent grid node, Thus no learning can occur vntil the 
prediction signal begins to arrive at the arrowheads, However, we 
have seen indications that learning does occur with grid events 
presented at times other then 7 time wits after presentation of the 


command event, Jn this section we shall examine this phenoniena, 


but first we must develop a notion that will make discussion of this 
phenomena easier, If we are going to study how well an outstar learns 
associations between the command event and grid events which may occur 
more than or less than tT time wits after presentation of the command 
event, we will need a method of describing when these evonts occur, 
Measuring the occurance of grid events relative to the occurance of 
the comand event is not a very good idea. No learning can occur 
until the prediction signal has arrived at the arrowheads, The 
transmission time doley v is a rather arbitrary time interval which 
may be changed from outstar to outstar, 

However, once the prediction signal begins to eyes oe the arrow- 
heads, the outlstar will begin to Learn the pattern on the grid nodes 


oo 





A 


iene) SengieS EaS° 
SHAPE 





‘p++ 
— S=Il/ 
CA/e 
Bie = ee) 
CLG — 
A/a —- ” X (4) ( 
WITH INPUT | PREDICTION ONLY 
A 
v(a)s — (Zi 
O 25 AS 68 85 
TIME (UNITS OF 8) 
-}———_——|/u | — 


ieREASEAZ PROCESS PHASE-~CORRELATION CURVE 


60 


AMPLITUDE) 






IRREDUCIBLE PHASE 


GF CORRELATION CURVE 


i eit) O 28 48S 
PHASE ¢ (TERM OF 8) 


Figure 3.5.1. The upper traces show the inrut pulse used, the 
resulting prediction Signal, the response of a grid node to an 
event of @ = O presentation phase, and the response Of othe apse lie 
process associated with that’ node, “The bottom curve shows the 


phase-correlation curve and the irreducible phase-correlation curve, 





independent of how long it took the prediction Siena to travel from 
the comand node. Thus a good reference point for deseribing the 
occurance of grid events is the time instant wnen the prediction signal 
begins to arrive at the arrowheads, We shall denote this instant in 
time es P = 0 and let D be the time measured relative to P = 0 at 
which grid events are presented. Grid events which occur before 

® = 0 will be said to occur at negative values of P and grid events 
which occur after P = 0 will be said to occur at positive values 
of OD ; shall be called the phaso of an ovent with respect to the 
prediction signal, or simply the presentation phase, To be precise, 

® will be defined Heure inoue: Let m be the time instant at which 
the prediction signal begins to aerivewnt the arrowneads, Let te be 
the tine instant at which a grid node input pulse eetENG to be non 
sero, Then is: 


=< 


ott, 

The following experiment was performed, A practical simple outstar 
with a fast forgetting rate and many grid nodes was set up. The constant 
v was selected to result in well learning in one presentation of a 
grid event with P = 0 presentation phase. Thon each of the grid 
nodes were excited with events presented with various presentation 
phases, Tho z2 processes were all given zero initial conditions The 
maximum amplitude of the 2 processes attained during the experinent 
was plotted against the presentation phase oO © Lacking any better 
name for a curve showing the variation of Z process amplitudes with 
the presentation phase, the curve ere arbitrarily be called a 
"phase-correlation" curve, A phase-correlation curve is shown at the 


bottom of figure 3.5.14 
6/ 








Figurs 365ei1 shows a variety of things besides a “phasc~- 
correlation" curve. The top trace in figure 3.5.1 shows the shape 
and dimensions of the input pulse used in the oxperiment, The 
x (t - T ) trace shows what the prediction signal looked like as it 

Cc 

arrived at the arrowheads, The first response of the x, () trace shows 
what the x process response looked like for a prid node excited by 

an event presented with P = 0 presontation phase. The second 

response on the x, (t) trace shows what a. prediction response for this 
grid looks like. The z 46%) trace shows what the z(t) process 

in the arrownead impinging on the above V. gyid looked like, The 
irreducible phase-correlation curve shown is related to the phase- 
correlation curve and will be explained shortly. 

The additional information shown in figure 3.5.1 is provided as 
a pictoral look at the various processes going on in an outstar, 

This information was gathored from a number cf experiments and will 

be compared to the results of the next section in which we study the 
effects of using other input pulses in an outstar, Thus the actual 
numerical values for the amplitudes of the processes shown are some- 
what meaningless, To allow comparisons to be made, the data in figure 
Be aL Was plotted as functions of various netuork paraneters. 

In the preceding experiments we have followed the convention 
in assigning values to the x process rise rate & and the input pulse 
duration § of setting §=1/a . The time jntorval & = 1/a 

describes two important time intervals in the network: The input 
pulse duration, and the rise time of the x processes, Since this 
study is limited to input pulses of duration $ and since wo have 


assigned o& such that 1/a = 6 throughout, a naturel solection 
62 





for a time unit among the experinental parameters is o=if/a , 
The time axes in figure 3.5.1 are thus in terms of 6 =1/a , | 
Since the time constant assoziated with the 2 processes is the decay 
tine i/u, this period is shotrm on the a traces. 

The analytical solution for a x process responding to a rectan- 
gular input pulse prosented at time t = tp of amplitude A and duration 

is: ’ 

OL [tie can 


(A/a (4 - ) fort, Sat a ty +8 


+ 
x(t) = of t-(t,ts)] 
(A/a (1 = “ye saan for ty +6 at 


+ 
where the notation { J] is defined by: 


y fory > O 


1? 


el |= 


—= 


OniGn = 0 
Note that this solution is valid independent of the numerical values 
assigned to A, A , and 6 as long as jas 7) Cae ae 
~ Thus the amplitudes for the x processes are always proportional 
to A/Q and this combination of expoximental parameters was used as 
the amplitude axes for the x processes showm in figure 3.51. 

The eauation for the 2 processes is nonlinear and an analytical 
solution was not found in this study. A combination of experimental 
parameters was sought to scale the amplitude axos for the z process 
traces and the phase-correlation curves, It was desired that a plot 
of a z process against this scale factor would be the same for all 
experiments even thougn the nimerical values of the parameters in the 
ae ments were different. At the beginning of the experimental 
study, the pararneter combination a(A/ a ye & seemed to work well oat 
was thorefors adopted, Howover later exporinents showed that this 


GS 


¢ 





scale factor did not work well, Nevertheless, it was retained to allow 
comparisons, With this explanation of the scales for the axes of the 
plots in figure 3.5.1 , we may eee with a discussion of the phase- 
correlation curves 

The phase-correlation curve in figure 3.5.41 shows the maximun 
imerease in amplitude of a z process due to the ccrrelation between the 
| prediction signal and a grid node x process excited by an event 
presented with presentation phase 0 eS ean be seen, the maximum 
increase in enplitude for a Zz procsss cout hen a grid node is 
excited by an event with ? = 0 presentation phase. Events presented 
with P # O indicating that they were prosented before or after the 
arrival of the prediction signal at the arrowheads result in a lesser 
increase in z process emplitude, For 1D! 736 = 3/& ,» there is no 
appreciable increase ‘iin 2 procoss amplitude, 

the effect of the phenomena revealed by the phase-correlation — 
Biv nay be interpreted in a nunber of ways, Suppose that eB command 
event is presented to the outstar at time ues ‘Suppose further that a 
collection of gird events, 1, 2, «ee, M, usually acconpany the occurance 
of the comvand event in the environnent, However, suppose that these 
grid events do not all occur at the same tine. Let cach one occur 
at time Uys tos woos tyro The prediction signal Ponorated Uy. the 
command event will arrive at the arrowheads at time st ea Lhe 
phase-correlation curve tells vs that the outstar will learn to some 
extent that. all the grid events which occur at times ts such thats 

mere yet |< 35 = 3/e 
Cc al 

are associated with the command event, Note that co +T ) = ts is 


the presentation phase ? 48 for the ith event. The phase-correlation 


64 





curve tells us further that those events which occur at tines a 


q 


evien thats 
| 


te 


(+ te yee | < 0.5 = 1 /2% 
i J 
will be learned to be associated with the command event very well, 

One interpretation of this information is that we now have a 
means by which we intelligently can specify YT in an outstar., We have 
said nothing about when a command event occurs relative to a pattern 
of grid events. i every day experience we are confronted with situ- 
ations in which the occurance of a co event results in the 
occurance of a "pattern" of events, The time delay between occurance 
of the command event of switching an electric light switch resulted 
almost immediately in the pattern of the electric Lights in a room 
going on, We also learned that the command cvent of putting a seed 
‘in the ground resulted days later in the “pattern” of a plant sprouting. 
in designing an outstar functioning in a "real" environment, specifi-" 
cation of v should be made according to the average time delay between 
occurance of command events and tho associated patterns that the ovutstar 
1s capable of learning. The phase-correlation curve tells us wnat the 
standard Iesapeleie of this time delay can be and still result in the 
outstar being able to learn, 

On the other hand, the phenomena shown by the phasc~corvelation 
curve is a source for errors in an outstar ayalanche, Suppose that 
the comand nodes in an avalanene command node cascade are so arranged 
that tho time interval between excitement of the vo command node and 


J 


the Peat command node is 7 ot This means that the avalanche takes 
a ; 


tf ? 


pictures" of the time varying pattern of grid events every coe tinie 


units to make a sampled data approximation of the pattern, From the 


os 





phase-correlation curve of figure 3.5.J. we can see that if Tv , is less 


then 3$ = 3/a, the picture taken by the outstavs ‘in the avalanche ; 


will overlap ons another, That is, the V outstar will learn to 


Cyus 


ce 


the Nae outstar Learms, 


cr 


some extent the same pattern of events tha 


i 


* 


Jn particular, suppose that the pattern of cvents is varying rapidly 

enovgh that the pattern of grid events at time t + § is significantly 

different from that at time t. To get an accurate sampled data 

approximation ‘in this situation, the avalanche would have to take 

a “picture” every § time wnits and wo would set T ie oe However , 

the phase=corrclation curve shows us that in this case the Yoo 

outstar will learn not only the pattern of events on the grid when 

its prediction signal arrives at the arrowheads, but also the pattern 

of events that was on the grid when the prediction signal from the 

y rector arrived at the arrowneads, In this situation, the 

aa 

avalanche's sampled data approximation will be seriously in error, - 
The phenovaena shown by the phase-correlation curve in figure 

3.51 is due to two things, First, the input pulses used in the ex- 

periment were rectangular and of duration 6 . Suppose thet the equation 


a 


for the x processes was such that the x processes exactly reproduced 
the input pulse, That is: 

x(t) = P(t) 
Then the prediction signal. and the x processes’ responses would be 
reetanrular in shape and of duration & . The zs process corrolates 
the prediction signal with the grid node x process, Thus wo could 
expect the z process anplitude increase due to a correlation to be 


proportional to the correlation between the rectangular prediction 


signal and the rectanguler grid node x process, If the grid node is 
66 








~ 


excited by an event wnich occurs with prosentation phase ? with 


respect to the arrival of the prediction signal, we got: | 


re 
a at ) for @> 0 
a(t) = r , 
[ Vee] for $< 0 
or 
[s.. @) "gor o> 0 
a(t) = . 


ot. 
[8+ 0} * for <0 

This is just the correlation between two rectangular pulses of 
duration § whose lesadin ing edges are separated in time by @ . This 
function is shown in figure 3.5e1 as the "irreducible phase-cox Et ation" 
curve, This curve is called irreducible because it shows wnat the 
phase-correlation curve would look like if the x processes exactly 
reproduced the input pulse, 

As we have seen, the x processes do not exactly roporduce the | 
input pulses, This is because embedding field network nodes are low 
pass filters, We have seen, and our analytical solution shows, that 
the x processes’ response decays oxponentisally away fran the maxinun 
value it obtained during the presentation of the input pulse. This 
exponentially decaying portion of an x process response will be called 
3. wtail" These tails account for the difference betwoon the irre- 
ducible phase-correlation curve and the phase-correlation curve, 

Because of the tails, events presented with presentation phase 0 such 
that << O still have non zero amplitudes to correlate with the 
prediction siensl when it arrives at ae arrowheads. Prediction signals 
also have tails which correlate with grid node x process responses to 


events presented with presentation phase ? >> 0, As can be seen, 
G7 7 


ee” 





this cffect begins to become important for events presented with 
presentation phase IVI < 38 . 

In an avalanche with a fast sampling rato, modifications of the 
component outstars that result in 2 phese-correlation curve which more 
closely resembles the irreducible phase-correlation curve are impor- 
tant. One modification would be to increase the x process rise rate 

QO. Making & very large will result in x process response that will 
very closely follow the shape of the input pulses. Thus the phase- | 
correlation curve should be very close to the irreducible phese= 
correlation curve, 

However, jnereasing OQ is not always possible. In this study, 
increasing QO either resulted in intolerable errors or extremely 
lengthy computer runs to porform an expreiment, Appendix A explains 
the error-computation time trade off in selection of A for the 
digital simulations of this study, <= 

If & can not be increased enough to make the phase-correlation 
curve sufficiently close to the irreducible phase~correlation curve, 
there are other methods which will accomplish this, Grossberg has 
proposed the use of thresholds, The equations for a simple ovtstar 
with thresholds are: 


3.51 oc Gap 
Cc 


352 5 (t) = eax, (t) + peo) [x(t 7) -T)” +P) 
firs 342062 C(t) = raz.) + v[x,(t ~T ) Tl |x, (t) : I : 


il 


Sr Coat P tt) 
c 


{] 


ci 
where: 
y for y>? 0 


oe 
fy | = 
O fer y= 0 


68 











oiv(S)s oi 
| ae 
2a |/u—--— 
o.v(s)s , Z oolt) 


, [ele ——}— 
Od 2 AS 68 88 


SE-CORRELATION CURVE 
INCREASE IN 27 PROCESS | 
AMPLITUDE———7 (IRREDUCIBLE PHASE 


CORRELATION CURVE 
avila” (7 | NCOrRELATION 


=25 O 28 
PRESENTATION PHASE $ (TERMS OF 8) 


Figure 3.5.2. Illustration of the effects of thresholds on a simple 
outstar, Equivalent thresholds are placed on both the command node and 
the grid nodes, Note how close the phase-correlation curve is to the 


irreducible phase-correlation curve, 


64 





iy is the command node threshold. As can be seen, it prevents 
C | 
the prediction signal x (t - t ) fron exciting a grid x process 


c 
until x(t -t)> i Additionally, it prevents the prediction signal 
from being correleted with the grid nodes' x procosses witil x Ct -~T ) 
is suprathreshold, The grid node threshold, Ts performs the same 
function, In effect, these thresholds will cut off the “tails” of 
the x processes and thus should result in a phase-correlation curve 
which closely resembles the irreducible phase~correlation CULVE» 

Figure 3.5.2 shows the results of an exporinent conducted with 
an outstar with thresholds, The command node threshold ils used ‘in 
this experinent was selected to make the time interval during which 
the prediction signal is suprathreshold approximately $ tine units 
in duration as can be seen from the x (t) trace. The grid node threshold 


C 
\' was selected to be the same, TV = |’. As can be seen, the 
x c 


phase-correlation curve very closely epproxinates the irreducible ~~ 
phase-correlation curve, Using thresholds, we could make an avalanche 
which could accurately sample a time varying pattern every 25 timo 
units, Without thresholds, the shortest the accurate sampling interval 
could be is about 66 tine units as shown in figure 3.5.1. Thus the 
addition of thresholds has increased the eccurate sampling rate for 

an avalanche by a factor of three, 

However, this possible increase ‘iin the accurate sampling rate for 
an avalanche has not been obtained without a eosts) The x, (t) traces 
and the z 3f) traces in figure 3.5.2 are for a grid node excited by 

Cc 
an event with presentation phase 2 = 0.56 , Looking closely at the 
x(t) trace, one can see that in the first response, P(t) drove 


= 


x (L) above threshold and thus 2 (+t) grew. However, on the second 
F 2 70 





PHASE-CORRELAT ION 


INCREASE IN Z PROCESS | CURVE 






AMPLITUDE IRREDUCIBLE PHASE~ > 
Ax CORRELATION CURVE 
0.3(2)6 — 
aap a 
-28 0 2§ 


PRESENTATION PHASE @ (CIN TERMS OF & ) 


FIGURE 3.5.3. Effect on the phase-correlation curve of a 
threshold placed on the command node only. 


Cl 





response, the escitenent of x (t) was insufficient to drive it 
suprathreshold and thus 2 3%) continued its exponential decay. 
c . 
Lacking the ability to drive x(t) suprathreshold, the outstar can 
not “pump wo" the z (+t) process and we must conclude that the memory 
eZ . 
Zoo (t) is bound for oxtinctione In the same way, if the Z oq Ct) is 
allowed to decay further, prediction excitement of x, (+) will also 
be unable to driv 2, () suprathreshold and all memory of the pattern 
would be bound for extinction. ‘Im the simple outstar without thresholds, 
we saw that no matter how much the z Spoenelee decayed, wo could still 
recover tha information stored in then by "pumping up". Thus, although 
a memory could fade due to forgetting, it could not be absolutely 
forgotten, An outstar with grid node thresholds can absolutoly forget 
a pattern it has learned. 
To prevent a menory from being absolutely forgotten, we nust set 

le = 6, This was done and 2 series of experiments were performed 
to deteruine the phase-correlation curve, Figure 3.5.3 shows the results, 
The only x process “tail: that was cul off by a threshold was the pro- 
diction signal's, Thus, the phase-correlation curve-for 070 is very 
close to the irreducible phase-correlaticn curve, This is because 
events with presentation phase 9? 0 occur after the prediction signal 
has Merived atv the arrowheads, Cutting the prediction signal's tail 
off prevents it from correlating with x precess responses to events 
presented with presentation phases greater than the time intorval 
during which the prediction sienal is suprathreshold, In this case, 
this meant no correlation with x processcs responding to events 
presented with presentation phase P>&., On the other hand, the x 


processes reteined their "“teils" because als =O, Thus the "tails" 
Te 





of grid node responses to events cccuring before the prediction signal 
arrived at tho arrowheads ( 0<0) were available for corrolation. 

This explains why the Pecan curve for 0<0 in figure 3.5.3 
is similar to the phase-correlation curve for an outstar without 
thresholds. 

In addition to making the phase-correlation curve for an outstar 
closer to the irreducible phase-correlation curve, thresholds may be 
used for an interprative purpose, Since the prediction Signal 
[x(t - t) -Ty could not effect the grid nedes until xb =e 
was suprathreshold, we could follow the convention of saying that 
an x process at a node does not indicate a response by that node until 
it is suprathreshold. We covld still set ie = 0 in equation 3.5.3 
and place an imaginary threshold on the grid nodes, With this inter- 
pratative convention, we have a concrete relationship between the 
amplitudes of the x processes end the psychological idea of 4 response 
from a subject. Additionally, the phenomena of a faded memory 
popping up into tho outstar's consciousness during “pumping vp" 


AS given a concrete interpretation, 


C2 





section 3.6 Other Input Pulse Shapes 


A short study was made of the effects on a simple outstar of using 
input pulses with shapes other than rectangular, The results were 
that there appear to be no qualitative differences in the performance 
of an outstar using any input pulse of duration less than or equal to 
1/a . The sole exception to this qualitative finding was that the 
choice of input pulses does affect the shape of the phase-correlation 
CUrve» 

Quantitatively, the input pulse did affect the maximum amplitude 
of the x responses, Additionally, the megnitude of v to mect a specific 
well learning criteria was affected. 

One important result of this study was that the maximum amplitude 
of a grid noce x process responding to a prediction signal alone was 
at approximately i/ timo units after arrival of the prediction signal 
for all input pulses, If we consider the input apparatus of the outstar 
to be a dats sampler which samples the environmont at time bo and de=- 
livers appropriate input pulses to the outstar's nodes, then this 
effect can be considered to be an inherent time delay in the outstar's 
prediction. That is, an event which occurs in the environment at time 


t, is predicted by the outstar to oceur at time t) + a On ae 


0 
Figures 3e6ci, 3.6.2, and 3.6.3 show the results for the pulses 

used in this study. They shovld be conpared to figure 3.5.1 which shows 

similar results for a rectangular pulse, The irreducible phase-correlation 

J. e 


curves in these figures were computed by analytically correlating the 


input pulses. 


74 





A_ 


INPUT PULSES’ 
SHAPE 
-——=} 6=I/a 
Ase ~— 


O5A/a — X (t- 7) 
/ ees Hr | 
Ble — 
O42: — X(t) 


aye. WITH INPUT PREDICTION ONLY 
oevle)s — 7) 


Cl 


A \e | 
GN Nc SRA 
28 48 68 88 IOS 


Os 
ae I/u interes 


INCREASE IN 2 PROCESS 






|PHASE-CORRELATION CURVE 


AMPLITUDE 
ae IRREDUCIBLE PHASE- 
Mis ( ) _ _ CORRELATION CURVE 
o.1v (4)s 
ae ee] 


-45  -28 O -28 -48 
PRESENTATION PHASE ® (TERMS OF 8) 


Figure 3.6.1. The response of an outstar to a triangular input pulse, 


wo 





A 
INPUT PULSE iS 
SHAPE Seale 5 ae 


€) 
oe eee 


A 
Gi S=I\/a 
0.252 — X,(t-r) 


A 
OSG a 
025% — — Xft) 
ft ol ee 


ne. WITH INPUT PREDICTION ONLY 
Oiv(e)s — 


a Z., (t 
O.1V (e) O \ Ga an ae 
Ye et eee 


O 29 48 68 85 


|---| /u 


INCREASE (Z PROCESS PHASE- CORRELATION CURVE 








AMPLITUDE) IRREDUCIBLE PHASE- 
| 3 CORRELATION CURVRE 
Oo aa 
) 2 


PRESENTATION PHASE ® (TERM OFS) 


Figures 3.6.2. The response of an outstar to an exponential inrut pulse, 


16 





(A) 


INPUT PULSES | 
SHAPE (IMPULSE) | 








A 
O54 
WITH INPUT PREDICTION ONLY 
»,, lov (a)ta) = 
, sv (Ayty ett 
O/a 2/a 4/oa G/a 8/a lO/a 
lnc, cat 


INCREASE IN Z PROCESS 







AMPLITUDE PHASE-CORRELTION CURVE 
(IRREDUCIBLE PHASE-COR- 
ome RELATION CURVE IS AN IM- 


SV(q) leh, 


a 


PULSE Al WE ORIGIN: 


= ey mt 
-4/a -2/e O c/a 4/a 


PRESENTATION PHASE % (TERMS OF 3) 


Fipure 3.6.3. ‘The recponre of on ontstar to im inpulse input pulse, 


7 





CHAPTER 4 LATERAL INHIBITION 


section 4.4 Introduction to Lateral Inhibition 

The last chapter showed thot a practical outstar ( one with a fast 
forgetting rate) had the major drawback of cither being a slow 
learner or having very low resistance to random mistakes. This was 
due to its inability to additively sum its past experience in the 2 
processes because of the large decay rate, In this chapter we will 
study a more conplicated ovtstar which retains all the desirable 
qualities of the simple ovtstar with a fast forgetting rate and has 
the further property that it is resistont to random mistakes, 

The additive sunming of past experience in the slowly forgetting 
outstar of chapter 3 resulted in good oo eee to random mistakes 
because this outstar's experience with the correct pattern was so great 
that it could absorbd mistakes. The opposite of this pessive absorption 
of mistakes would be to use the past experience to actively supress a 
mistake when it occurs. The psychological term for active suppression 
is inhibition. Figure 4.1.4 shows the geometric schematic and the 
equations for a laterelly inhibiting outstar, The cquations governing 


its performance are here repeated for convenience: 


med 4 x (b) = -ax (1) + P Ct) 
Cc Cc eC 


-Oxs(t) + P,(t) (2 tee daft) “ 
N ee 
< & ie > T 
p 2 Leg )] 
Jt 


Wehe3  Bog(t) = z(t) + v [x(t = 7 dx, Ce) 


2 x. (t) 
al: 


Cc 
2 : : 
The notation [ y] ‘-means the maximum of the variable y, or 0, as 
in the ease with thresholds, A short discussion of the significant 


differences between ejustions /4.4 and those for a simple outstar follows. 
78 


e* 


—_ -_ 
—_ —— — es —- 
-_ = 
ee ae 
> 
aap 
= <-> 
a 7 ee 
<< = -~ 
- -_> 7 © = 





FOUATIONS GOVERNING NETWORIC PERFORMANCE 


tL] X(t= -AX.(t) + BCE) 


N + 
HZ Xt - 0X + BU) + BZ (EIN (E-U) BLS (e-r] 
xt 


+t 
Meld ZofltJ=-UZelt) # VX, -WX )/] 


Figure 4.1.1. An outstar with lateral inhibition, The double 
if aes IE a: 2 Ae ae 
lined directed edges transmit inhibitory signals. Only three 


prid nodes are shown (N = 3), 


14 





A negative prediction signal ~ B ; x tee yl has been 
fo 18k ; 


added to the equation for the grid nodes! x processes, This is the net 


inhibitory signal sent to grid node i from all the othor nodes in the 
grid. These inhibitory signals are sont along ths double lined 
directed edges in figure 4.1.4. The transmission delay from the 
originating node to the receiving node is tT. Note that a grid node 
sends an inhibitory signal only if its x process is positive, With 
inhibition, it is possible for an x process to have negative amplitudes, 
We shall adhere to the convention of considering that & node is 
responding only if its x process is eee Although we will be 

able to measure the negative excursions of the x processes thoy shall 
be considered equivalent to ZELO amplitudes in the simple outstar, 

Jn the simple outstar, zero or small emplitudes were interpreted 
@2S no response. tn the laterally inhibiting outstar, negative amp- 
Lit vudes mean that node is in an inhibited state, Using the above 
convention for interproting the response of a node implies that an 
inhibited node is in a super non responding state, Limiting a node's 
ability to affect other nodes via the inhibitory signals to only 
those times when its x process is positive is consistent with the above 
convention. 

No learning occurs in the arrowneads of the inhibitory directed 
edges, The 2 process in those arrowheads can be considered to always 
have a valve of unity. 

Eguation /+,1.3 for the z processes located in the arrowheads 
of the directed edges from the conmand node is the same as that for 
& simple outstar, Again, a node's inhibited state is ignored by 


the correlation driving function v[ x(t - °v )x, (2) | ~, Thus the 
80 





Z processes can only have non negative values. For this reason this 
outstar is an excitory biased machine, We will have cee in & 
later chapter to investigate outstars which allow negative 2 processe 
and are more neuwtrally biased, 

The rationale for lateral inhibition is to have a responding 
grid node inhibit all the other grid nodes, Whoen several grid nodes 
are responding at the same time, we expect the node responding with 
the greatest amplitude to inhibit the other nodes the most while 
suffering the least inhibition itself, When a random mistake occurs 
in a previcusly learned pattern, the prediction signal inputs to the 
grid nodes will causs the nodes corresponding to events in the pattern 
to respond with greater amplitude than the nodes corresponding to the 


wistake, This will result in inhibition of the rosponse to the mistake, 


G1 


e* 





o 


section /be2 Experinentel Study of an Outstar with Lateral 


ieee eat | 


To test the claim that the laterally inhibiting outstar hes good 


noise resistances, we shall ropest experiment II which was performed 
with the simple cutstar, All tho parameter specifications for that 
experinont will be retained, However, we have two new paraneters to 
specify, 8 and T™ « 

If it takes too long for inhibitory signals to travel along their 
directed edgcs, then wo shall have defeated the purpose of lateral 
inhibition by having inhibiting signals arrive after the damage has 
been done. Thus 7 should be small, Lateral inhibition would be 
most effective if T = 0, but we shall observe the constraints on 
transmissions along directed edges sot up in chapter 1, With these 
arguments in mind, tT is sclected to be: 

T= zr = tN < = mae 

3 O 3 3 

A rational guess for B is diffievlt. In order to specify it most 
efficiently we would need some ides of the average number of givid events 
composing & p2ttern and the average number of events that compose 
@ random mistake, The rexson for desiring this information when 
selecting es is obvious; Suppose that we had two patterns we wished 
to teach to tue outstars sharing the same grid, Pattern o, is composed 
of one event. Pattern 6. is composed of n events whore 1< n< N 
and N is the number of grid nodes. Then the node corresponding to tho 


_ 
event ‘in patter 6, will not be inhibited at a1), However, each of 


—— 
tho nodes corresponding to events in oO, will inhibit each other and 


» 


will. have a diminished amplitude. Thus the 
Oe 


the nede responses to @, 





— 
gz correlations for oO. will be smaller and it will require many more 
« i ° => ° 4 * e -» 
instructions to learn Qo than it will require to learn 6 1° Any 
e a . By 
selection for 2 will work well in learning O,e On the other hand 
an excessively large 8 will result in very inefficient learning of 


— 


O56 
However, we want @ large enough to inhibit random mistakes, 

Thus we are faced with a trade off between inefficient learning ard 

the proper degree of inhibition to counter mistakos, A fore-lnowledge 

of the average situation to expect wovld greatly aide ‘in the proper 

selection of oe « Of course, if we wanted ovr outstars to be 

completely unoiased at the beginning of the experiment, wo could make 


a large number of then with various 2 and turn them lose in the 


environment, Survival of the fittest would soon select the optinsl. 
pe 
For the purposes of this study, it was decided to select 2” on the 

idea that at most two events would compose a pattern and a random 
mistake on the average would consist of one. event. p was chosen to 
allow the inhibitory signal from excited nodes to drive an wexcited 
node to approximately one-half the amplitude of the excited node, 
A brief analysis was made to meet this criteria as follows: 


Maximum oriplitude of en x process excited by a rectangular pulse 


ns 


— 


of amplitude A and duration § = i/Q was maxtx, (t)) = (A/@ )(i - >) 
Pos A/a =, 
Amplitude of such an input resulting in (1/2)max(x,(t)) is IZM, 
(3 (0.63)(A/& ) = (4/2) A 
or B = &/1,,26 


for X= 3.333, 3 = 2.64 fe 





An experimental check of this resulted in B = 2,38. Tho 11% 
error is due both to the naiveté of the analysis and errors inherent. 
to the digital simulations | 

Inadvertently, v was changed to 2.4 resulting in a well learning 
in one and one-half presentations criteria. ‘This minor descrepency 
is not sufficient to prevent comparison with experiment II, For 
convenience the major parameters used are listed here: 


Network parancterss 


1 = 3.3333 sec. 4 = 1/8 
B= 1.0 
v= 0,556 Po 


ii 


v=2.44 
t = 0,3 sec, 

i= 0:1) sect 

(2 = 2.38 - = 


Input pulse paraneters: 


* 
i 


= 10 


Ow3 sec. = io 


The equations governing the performance of the laterally inhibiting 


outster are: 


x(t) = - ax Cr) a P Ct) . - 
elt) = ast) + Pye) + egy (blne 2 ) -@ Bese = 7 )] 
2, (t) == mz, (t) + v [x(t -% jx, (t)|* a 


Experinent IIL was begun by teaching the ovtstar the pattern 


io. by two presentations of event 2, tT time units after presentation 


of the command event. Figure 4.2.1 shows tho rosult. Thr prediction 
BY 





7 € op EC 8 2 


"Tey eungt 


6 —~ ee m7 = 
uds1zed ayy Aegspno FuTZTQT YUE ATT 6As4el e@ suryseey, 


a 


S 
TIME (secs) 





Z x(t) 
Za 


aa 
O | 


_— 





1O— 2 colt) 
aa 


(Oo 


ex 





response on the x(t) trace snows that ia was well learned as 
is indicated by the Bolt) trace, Figure 4.2.1 should be compared 
with figure 3.3.1 which shows the rosvlt of the same pattern being 
taught to a practical simpie outstar. 

Of interest in figure 4.2e1 is the fact that a minor association 


of V, with V, did not occur to any sienificant extent, Event 1 was 


3 

presented with prosentation phase P = O with respset to arrival of 

the prediction signal at the arrowheads, That is, event 1 was pre- 

sented at the exact time instant that the prediction signal arrived 

at the arrovheads. In the discussion of the phase-correlation curves 

of section 3.5, we saw that presenting an event with presentation 

phase @ = 0 results in the greatest increase in the amplitudes of 

the z process, In this sense, an event presented with presentation 

phase P = O is learned best, Events presented with presentation phase 
PD #* 0 are learned to a lesser extent. In figure 4.2.1, event 3 - 

was presented 0.6 seconds after event 1, That is, event 3 was pre- 

Sented with presontation phase ? = 0.6 seconds = 25 =2/a, From 

the phase-correlation curve for a simple outstar without thresholds 

in section 3.5, we saw that presenting an event with presentation 

phase ? = +2 § =+2/q resulted in a Sienificant increase in the 

associated 2 stiencente amplitude, The addition of thresholds to the 


Simple outstar prevented any increase in the associated 2 process by 
cutting off the "tails" of the x processes. | 
The laterally inhibiting outstar corrently under study doos not 
have thresholds, However, from the fact that only a very minor associ- 
i 


ation Ne was learned in figure 4.2.4, it appears that lateral 


inhibition has some of the same offects that thresholds have on the 


86 





“ay z kg poleeaed oxeystu fo Azouseu fo UOT POUTIXe 

| c 
pue se Jo esuodsed psqTqryuut o ON “ATsnotacsid peules—T Af 
useyqzed ayy UT exe{sTU wWopueT eB SejeTMuts (4) a "IPAsInoO: SULT ea YUL 


£TTedezyeT @ AQ poetejso S@HeYSTW WOPUR Of SOURTSTSOY a2 4) wal oy 





) | 

. Ss = 

— x 2 

>< a ee Ni 
aan oO 

o | — NI 

O Ou 

| | wi | 

O © O 





al ee sce gee 


3) 


3 4 
TIME (sec) 


eZ 


O 


Ef 





performance of an outstar, We will investigate this performance in 
detail ‘iin section 4.4, | 

Experiment JII was continued to eheck the claim that lateral 
inhibition increases resistance to random mistakes, Figure 4.2.2 
shows the result of presenting the provicusly learned pattern V meres 
with a sinweted random mistake, event 1. As the x, () and 24 6t) 
treces show, the mistake was inhibited to the point where \ 
was learned to only a minor extent, (Compare to figure 3.2030) 
Additionally, predictions following ths mistake presentation resulted 
in the x,(%) process boing totally inhibited. This rosulted in the 


menory of the mistake decaying towards extinction, as shown by the 


246%) trace, Of course the dremztic results shown in figure 4.2.2 
were due to the comparative fresnness of the pattem Uae in the 


outstar's menory as shown by the large amplitude for Zaolt) wnen the 
mistake was presented. The memory of Vi--V, will fade as Zot) ae 
docays. If the memory is sufficiently feded, wo will not expect the 
resistance to random mistakes to be as gO0d¢ 

This has parallels in every day experience, Students are less 
Jikely to be doccived by a tricky question in an examination when the 
subject mattor is fresh in their minds. 

Lateral inhibition does not prevent the outstar from correcting 
& previously Learned pattern which is in error. Experiment IIT was 
continued to convert the previously learned pattern Ve with a 
now pattern ea Fipure 4.2.3 shows the results, As can be seon, 
two presentations of the new pattern were sufficient to totally 


inhibit tho old pattern and insure extine tion of its menory. 


SSE 





| Zo 
*(4)°x Jo eqnys' peqTaTUUT ou} 04 Np (7) 2% 50 
Z > 
UOTIOUTY XS 9Uy OJON Ses ono epogesyialoe bOGe AT [e4o4eT e Ut et A 


uleyqed paudvel ATsnotAedd ayy ZuTyoors09 a6 7e jaca. 7 


AP 


Z 


(t) X(t) 
> (t) X(t) 
: (t) X (4) 
L | | | 
QPL TE 
ot 
Zen 


Os 
Oo 
oo 





Z At) 


Pore | oe ee 


Lo 
O 


6 


S 


A. 


2 


C 


E4 


TIME (secs) 





section 4,3 Advantege of Correcting a Learned Mistake with 


Lateral Inhibition 


In section 3.4 we discussed the effects of the forgetting rate 
u on a sinple outstar's resistance to random mistakes and on its abili- 
ty to correct learned mistakes, The conclusion was that a small u 
resulted in good random mistake resistance, but very low correctability, 
A large u had the opposite offect. From the outstar’s point of view, 
the only difforence between a random mistake and a correction to a 
previously learned pattern is that the random mistake occurs infre- 
quently with the cowmand event whereas the ee pattern usually 
occurs with the conmaaznd event. It was show in section 3.44 that the 
outstar rencmbered the difference Berreen an event which infrequently 
occurs with the command event and one which usually oceurs with the 
command event in tho accwaulated past experience contained in tho 
amplitucae of its 2 processese With a small u the past experience 
was not forgotten rapidly and resulted in a great accumulation of 
experience, Jt was net surprising that an infrequent variation in the 
pattern head a small effect on the accunulated experience, On tho other 
hand, 2 great accumulation of past experience with a pattern makes it 
very difficult to convince the ouvtstar that the pattern was an error, 
Due to the fast rate of forgetting past experience in the large u 
outstar, little accumulation of experience occured resulting in its 
random mistake resistance and correctability proporties, Thus by 
anterpreting the amplitudes of the z peneesces as accumulated past 
experience it seemed rey reasonable to conclude that good random 


mistake resistance and correctability were incompatible, 


G0 





Figuro 4.2.2 and 4.2.3 show that this need not be the case, 
The laterally ‘inhibiting outstar has both good resistance to 
random mistakes and o56d coreeeenn anaes Lateral inhibition was 
introduced to make an outstar with a fast forgetting rate more 
resistant to random mistakes, It might have been expected that this 
would decrease its correctability. We shall inquire why it did not. 

In the slowly forgetting simple outstar the only way a pattern 
can be corrected is by brute force, The amplit tude of grid node x 
process sesponses is a linear function of the sum of the event input 
pulses and prediction signal ‘inputs: 

x(t) = -ax,(t) + az (t)x (t -t ) + P(t) 

zt 1 cu c a 
Thus tho amplitude of a grid node x process response is greater when 
there is an event input pulse than when there is only @ prediction 
signal input alone. Therefore the correlating signal vx, Ct -t )x, (t) 
is greater when thore is an event input pulse and the 2 process grows 
faster, In correcting a pattern in a slowly forgetting simple outstar, 
we simply stop presenting the events of the erroneous pattern and 
start presenting the events of the correcting pattern, As was shown 
in figure 3.2.2, the additional amplitude of the grid node x processes 
due to the correcting event input pulses results in the 2 processes 
associated with the correcting pattern events growing faster than the 
Z process associated with the erroneous pattern, By the outstar theoren 
we are assured that eventually the probabilities X,(t) and y,(t) will 
go from values describing the erroneous pattern to values describing 
the correcting pattern, However, we have seen that in the slowly 
forgetting outstar, the x process amplitudes will have become imprac- 


tically large long before this happens. 
q/ 





In the rapidly forgetting outstar, we do not have the problem 
of impractically large amplitude x processese Furcher, in trying 
to correct a proviously learned pattern we are aided by the rapid 
forgetting rate. In addition to the effects of the brute force 
correcting process, the rapidly forgetting outstar forgets the erroneous 
pattorn while it is learning the correcting pattern. (Provided of 
course, the excitations of the command node are sp2ced far enough 
apart not to result in significant pumping up of the erroneous pattern. ) 
Thus in addition to the active process of forcing the z process associ- 
ated with the correcting pattern to grow larger than those associated 
with the erroneous pattern, there is the passive process of forgetting 
the old pattern. As has been emphasized this passive forgotting 
process results in the better correctability of the rapidly 
forgetting simple outstar as well as its low resistance to random 
mistakes. : oe 

In the laterally inhibiting outstar, wo retained the fast 

forgetting rate to control grid node x process amplitudes, Thus we 
have both the active brute force correcting process and the passive 
forgetting process working to correct an erroneous pattern, If we 
look closely at figures 2,2 and 4.2.3, we can see the effect of 
lateral inhibition in both random mistake correction and pattern 
correction, In figure 4.2.2, presentation of the previously learned 
pattern Vie ¥, with the simulated random mistake VEY resulted in 
growth of both Zot) and a 4 bt)e However, the sim of the input pulso 
Po(t) and the input prediction signal Bolt dx Ct ~~ ) drove X(t) 
to a greater amplitude than x, Cb) was driven by P, Ce) alone. 


TherefLoxe x, () ves Aimvinished by the inhibiting signal from x(t) 
72. i 





and Z ,S*) did not grow to a very large amplitude, Both z ,(t) and 

C or: 
ot) decayed. On subsequent predictions the prediction input signal 
for A was not sufficient to overcome the inhibitory signal from Vo 

a : - ’ 7 rt . 

and the correlating signal v[x Ct ~¢ x, (t)| was zoro, Thus 4 (t) 
wos unable to grow on subsequent predictions and the fast forgetting 
rate insured that the random mistake would be totally forgotten, 

We have said that a random mistake oceurs infrequently. Thus 


we can expect that the fast forgetting rate will insure that the randon 
nistake will be forgotten before it occurs again during presentation 
of the pattern and there will be no accwmlation of expericnce with the 


the sucesssful correction of the 


ce 


mistake ‘in Bg bt) Now, look a 
previously learned pattern ‘as Vo with vom 4 in figure 4.2.3. 

it is seen that on the first presentation of the correcting pattern, 
the accumulated experience with the erroneous pattorn was still great- 
er than the experience accumulated on the first presentation of the ~ 
correcting atterm. At this point the outstsr could not be aware 

that ‘ag is a correcting pattern and not 4 random mistake, However, 
on the next presentation of V>-= Vy, the experience now accuimlated 


with V—*V » coupled wath the event input pulse, is sufficicnt to 


fc 
drive xy Ce) to a greater amplitude than prediction alone can drive 
x(t) Consecuently, x(t) is inhibited and Zot) grows very little, 
The fast forgetting rate now insures that Zolt) will decay to a point 
where a third presentation of the correcting pattern will completely 


inhibit prediction of Vee and it 


accumulate any more experienco with ea by prediction, At this 
Cc 


is impossible thereafter to 


point we can’ sey that the pattern has been corvocted, 





+ 


t is a compination of brute force correcting resulting in ace 
ewulation of experients with the correcting pattem, rapid forgetting 
of the erroneous pattern, and use of accumulating experience to 
inhibit the errencous pattern which accounts for the correctability 
property of «a laterally inhibiting outstar, The same combination 

of processes results in its rendom mistake resistance, It is the 
inability of a simple outstar to couple accumulation of experience 
with forgetting that results in the incompatibility of random mistake 
resistancs with correctability. 

Because of the inability to control the amplitudes of grid node 
responsos with small u's we will not undertake to study the variation 
of these properties in 2 laterally inhibiting outstar with a fast 
forgetting raves in chapter six, we will present a different fornu- 


lation of the outstar cauation 


4 


Ya 


which control the amplitudes of the 
grid node responses ‘independent of the amplitudes of the 2 processes__ 
and incorporate a form of lateral inhibition, At thet time we will 
consider the effect of decreasing the forgetting rate on the proporties 


of a laterally inhibiting outstar, 


i 





section /t.4 Further Remarks on Local Lateral Inhibition 


In the first part of experiment ITI, figure 4.2.1, it was noted 
that lateral inhibition appears to heave some of the sane effects 
thet thresholds have on the performance of outstars, The evidence 
was that presentation of event 3, 0.6 seconds = 2§ =2/a , after 
arrival of tha prediction signal at the arrowheads did not result in 
any learning of Nae Vise Further investigation shows that this result 
is of dubious value, 

The x att) trace ‘in figure 4.2.3 shows the inhibitory response of 
a node to a single input pulse at anothsr node in the erid. The 
maximun of this inhibitory response occurs approximately 26 = 2/a 
time units after arrival of the inhibitory signal at the node, Thus 
the maximum inhibitory response occurs at approximately t+ 2/@ tine 
units after beginning excitement of the othor node. The result is 
naxinvun inhibition of events presented a little less than T + 26 . 
after beginning excitation of a grid node, Now, if an event has becn 
presented T + 286 bofore arrival of the prediction signal, the event 
presented with oy = 0 presentation phase relative to the arrival of the 
prediction signal at the arrowheads would have been most inhibited and 
little learning of this event would have resulted, In effect, this 
means that to avoid ‘inhibiting an event to bo associated with the 
command event of one ovtstar sharing the grid with other outstars, 
tho interval between event presentations must be greater then approx- 
imately t+ 4/o time vaits. | 

The reason for the maxinun of an inhibitory response occuring 


so long after excitation cf a node can be seen analytically. If the 


945 


total input signal, I , (tb): to an eubedding ficld network node, a 
is a linear, time invariant functicn of time, then the node's x process 
has a transfer function i/(s +a), such thats 

x,(s) = (1,(s))/(s +a )e 
A cascade of n nodes has a tra onsfer funetion of (1/(s +a))", Due 
to the short duration of our input pulses, we aro dealing essentially 
with the transient response of the x process, Thus the transform of 
i/(s +a)" is a good indication of what our pulse should look like 
after having traveled through n nodes, 


i (n1)R_ gm (nat 
(sta)h  (netye oat Ver at 


The naximun of this occurs ats 


A; 
cs edt (t” : --t) x (t) 


dx, (t) 


yet = 


Qu sea 


Canal nol ort) 


- Thus the more nodes a pulse travels through, tho later its maximun 


or 





— 


occurs, Of course, the input signal to a grid node in a laterally 
inhibiting outstar is partially non linear. However, if we consider 
that the z process vary slowly enough so that we can consider them to 
be approximately constant, then the above analysis approximately holds, 
Thus in the case whee an input is given te one node which inhibits 
another, we have an = 2 node cascade, and: 
x(t) = ste” 
2 
with maximum at: 

=1/0 after arrival of the inhibitory signal. 


Now, if we add a prediction signal from the command node also, we have 


96 





*ATIUaTOT Je useqjyed JUusASe satu e@ ULEST 0F SdUeT 004 ST ouey yoosyea 
ALOFLQTYUT ou, °yUSsAG UO JO pesodwuod Ussj}JeU & Jos uot JeqUsseud suo UT 
SuULULBOT [Ten UT YpNsed of peqooTes SeM A *‘uleyzed eB UT peudvaTt aq o1 


SyUsASD JO Jaqumu 3U} UOJ edgieT 004 J BuTSooyo Jo yInsey “T°S* ommsty 


! 5» <p - a 
wig © 
a. i >< | N 
= 7 
J : _ =, 
= S&S ia 
7 on ~ 
Y a O 
- ar N 
© 


o o 


t) 


P Cia P(t), R, 
X (1), X(t), X 





te 


S 


Z At) 


lO 


in eee 
4 5 6 


5 
TIME ( 


ue 


ee) 


CO) 
eF) 





an = 3 nods cascade and: 

xy(t) = 41/207" 

with maximum av ¢ 

t=2/n0 after arrival of the inhibitory signal. 

Thus the occurance of the maximum inhibitory response betwee 
vt t+1/a and T+ 2/a is inherent to the network according to the 
approximate analysis. The experimental ovidence shows that this 
approximate analysis is reasonably correct, We will have further 
occasion to poneioer this "lengthonine” of pulses as they go through 
successive nodes when we study ouvstar avalanchose 

The earlier prediction that a ( suitable for learning a pattern 
of one event results in inefficient leaimming of patterns with more than 
one event was tested. AN #4 grid node laterally inhibiting outstar 
was used. v = 3.2 was selected to result in well learning of one 


~ 


event in one presentation and this was experimentally verified. AI - 
other parameters were the sane as in oxperinent III, The initial 
conditions on the z processes were reset to zero, Three events were 
presented to the grid? time units after excitation of the command 
node. A prediction was requested i/u time units later. Tho results 
are shown iin figure 4.3.1. Tho pattern V oes Vos V3) was learned 
very poorly, From this evidence it can be concluded that it would 
require many more rapid presentations of this pattern to result in 
well learning, 

Of course with lateral inhibition any 70 will result in faster 
learning of a pattern with fewer events, If we consider the number of 


elemental events as a measure of the complexity of a pattern, then this 


18 


effect translates into the statement that a conplicated pattern 


is harder to learne <A laterally inhibiting outstar has some of the 





same drawbacks as the human mental process, 


—- we 


oo 





CHAPTER 5 THE OULSTAR AVALANCHE 


section 5el Introduction 


In section 2.14 the outstar avelanche was briefly introduced, 

Its geometric schenatic and equation were shown in figure 2.1.2 
which is here repeated for convenienco, The basic idea behind the 
avalanche is to arrange the command nodes of many outstars ‘iin a 
linear cascade. Excitement of the first node in the cascade results 
in @ prediction signal arriving at the jth command node of the cascade 
37 time units later, Thus each outstar in the avelancho takes a 
picture of the time varying pattern on the grid at integer multiples 
of ~. The result is that the avalanche can learn and reproduce a 
sempled data approximation of a time varying pattern of events, The 
starting command node in the cascade represents an event which is 
associated with the start of the time varying pattern. — 

- The Linear command node cascade essentially ects as a clock to 
determine when the data samples arc taken. In order to perform the 
function we would want thr response of each node in the cascade to 
the prediction signal from the node inmediately before it to be 
approximately the same as every other nodee This is, however, not 
the case with the outstar avalanche arrangement shown in figure 2.1.1, 
The reason was discussed iin section 4.4 whero we noticed that the 
response of nodes iin a cascade got longer the more nodes a signal 
passed through. Based on the transient response of such a linear 
cascade, we analytically computed that the maximum of the nth node's 


response occured at (n = ye » A short experinent was conducted to 


test this result, Figure 5.14.41 shows a linear cascade of four nodes 


100 





STARTING NODE 


COMMAND NODE CASCADE 





EQUATIONS GOVERNING NETWOKK PERFOR? ANCE 
2.1.4 X(t) = - ax s(t) + P(t) 

Saatn)) = . iéM 
2.1.5 X(t) = -ax (t) + ax, it t) erate 


2.1.6) Hit) = ~ax,(t) + pE tes, jet -t) + P(t) 


for 14 5-1 


tt 


mee, 2. kt) = = ue. (t) + vx .(t 2% )x (t) 
J a J 


cr 5 pec 


Figure 2.1.2, An outstar avalanche and the equations 


5 


governing its performance, 


/O/ 





102. 
X alt) 


Xo3(t) 





*sapou sATSSeoNs YsNodYyy STeAeIy YT Se astnd ayy Jo FuLcusuyJueT ayy a oy 


“@peroseo SpOU PURUUIOD BYydUeTeAR JejYsjno ue Jo ssuousey “[°[°S eseunsty 


4a 6/e 8/a 


QO 2/a 


SCHEATIC 


EQUATIONS 


am Sexe Pht). 


ox 


= exs(t) - Px,,_,% - v) for > 


oi St) 


a 





excited by a rectangular pulse at node V,. As can be seen from the 
traces ae through x 6b) the responses did lengthen by approx- 
jmately (n - 1)/a . The equetions used in this experiment were; 

x4 (t) = “ax, (t) + P(t) 

x(t) = ~ax, (tt) + eet -T ) for i= 2 

The growing amplitude of successive node responses in figure 
5ei.1 is due to the fact that @ was selected to result in the x o(t) 
response being of approximately the same neximun amplitude as the 


x4 (t) response. For the parameter selection showm in figure 5.1.1, 


this resulted in 2a 3 of: 


{a 

( (1 - a) 

However, the steady state response of a node with transfor function 
1/(s +&) to a step input is to amplify the step's amplitude by 1/a , 
Thus, in order to maintain approximately equal amplitude responses in 
a cascade, should be selected to be 

(3 = % 
The fin the experiment shown 7 figure 5.1.1 was too large and re- 
sulted in the amplitude growth shown, 

The inadvertant amplitude growth in figure 5.1.1 does not detract 
from the basic result, A linear ersenee of command nodes for an ava- 
lanche is wnsatisfactory due to the Meer Oseaae lengthening of conmand 
node responses, In fact, this effect renders a complex network of 
embedding field elements requiring transmission of signals through 
many nodes rather impractical. Ina later chapter we shall address — 
this problen: directly, but for the time being we shall side step it 


by introducing a differently configured avalanche, 
103 


*dnoid [ele y.e[oo C suzy Jo speeyMoute ou} 48 


YY 
Cake 4) x Teusgts uotzyoTpedd ayy Jo THATAse oF A @pou pueumi0d ZuTFeYs SYyy Jo 

! 
quauieztoxe woly pesde[[e eut}y oy} st AE Zeuy SjZoN “sTetefeToOo pue ‘uoxe ZuOT ® 


ee ‘Spou pueuluod ZUTZIe4YS oTsuts wv JuTSsN oyouelTeae seysyno uy “*z°T’s sunsty 


t 


(Af - 4) (7)"xA + (ayFFan- 2 (a) 2 “ET's 


:f 
Hag d + (a)'a + (2) !xw- 
W 


(20-3) x(4) (ayix . *2°T'S 


(Gncet + Waa) ee eC i 


SGUY AONYNUOMUTd WHOMLEN ONINUAAOD SNOLLVAOE 


Ny ING ly 
SAGON aud { 0°"000 





SIVYSLV109 
3O dnoud 





NOXV SNOT 


S1VesLV 100 40 
anOa's) LSails 


10 ¥- 





Figure 5.1.2 shows an avalanche which performs the same theore-~ 
tical function as that pictured in figure 2.1.2 without the pulse 
lengthening effects, The ped ony sioieraee names given to the new 
elenents of figure 5.1.2 were suggested by the geometric arrangement 
of the nervous system in the cerebellum of vertebrates, The long 
axon is a long directed edge, At periodic points along the long axon, 
the directed edge splits into a continuation of the long axon and a 
group of N branches of the directed edge called a collateral group. 
Fach of the collaterals has an arrowhead impinging on a grid node, 

The distance from the starting command node, Vee to the arrowheads 

of the jth collateral Broun are so arranged that the tine elapsed from 
excitement of the starting command node to arrival of the prediction 
Signal at these arrowheads is jt time units. In each collateral 
arrowhead is located a z process for correlating tho prediction signal 
x(t - jT ) with the grid node responses. This long axon and - 
collateral geonetry performs the clock function of the avalanches, 


For ease of reference, the equations for this avalanche are 


given here: 


i 


meet) x (4) = - ax (t) + P () 
Cc c Cc 


N 
- ax, (t) + P(t) + B 22 ee - ony 


n 


5ele2 x(t) 
501.3 z.,(t) = wuz..(t) + vx(tix(t - jv) 

Equation 5.1.2 is for the response of a‘grid node in a sinple outstar. 
We will perform & simple experiment on an avalanche with this form= 
ulation and then change equation 5.1.2 to incorporate lateral inhibi- 
tion in our avalanche, The two avalanches thus formed will be called 


& simple avalanche and a laterally inhibiting avelanche, : 


[05 





Time does not permit an exhaustive study of avalonches, This 
chapter on avalanches is an illustration of the results and problems 


of using the outstars studied previously in an avalanche, 


106 





et, a ; 


section 5.2 A Simple Avalanche 


In this section we will use a simple avalanche to learn a time 
varying pattern of events. In designing a simple avalanche to do 
this, we must first ask what sort of tine varying pattern are we 
going to have it learn, Jf we have M collateral groups in our 
avalanche and N grid nodes, we must keep track of M x N z processes 
during the experiment. To conserve computation time, M x N should 
be small, An avalanche with M = 3 collateral groups and N = 2 grid 
nodes is chosen. It would be rather unrealistic to expect an avalanche 
which takes only three sample data points to approximate a continuous 
time varying pattern, Thus we will try to learn a series of time 
discrete events, That is, we allow the possibility of the occurance 
of the two events associated with the grid nodes in the environnent, 
We assume that the events represent time discrete events such as de- 
pressing the key of a piano. We further assume that there is a ies 
imum time between occurance of separate patterns of these events and 
we synchronize the avalanche's sampling interval YT with this minimun 
interval, To simplify the experiment still further, we shall indicate 
the occurance of these events with equal amplitudo rectangular input 
pulses to the appropriate grid node and follow the convention of the 
past chapters by making the pulse duration 6 equal to the rise time 
of a node's response: . 

b= i/o 

With this specification of the allowable input patterns, we have 
made the results of the previous chapters applicable to the avalanche, 


The other parameters will be specified accordingly: 


107 








“amas 


Oot 23 45 67 8 9 10 It. l2 
TIME (secs) 


Figure: 5.2.1. Results of an experiment with a simple outstar avalanche, 


ie 





A= 3.3333 soc. 

B=1 

v =1,.6 (two presentations for well learning criteria) 

A= 10 

B= 4/0 = 0.3 sec. 

We want ?v to be large enough to avoid significant over lapping 
of the "pictures" taken by each collateral group. From the phase- 
correlation curves of section 3.5, TT = 3/a = 38, should work, 
Thus v is selected to be: 

7=3/0 = 0.9 SOC. 

The memory decay time 1/u is specified to be the tine between 
successive presentations and/or predictions of the pattern, Thus: 

u=i1/4 see. = 0.25 Peete | 

Figure 5.2.1 shows the pattern presented to the avalanche and the 
results, The pattern was presented twice, Symbolically, the pattern 
presented was: | 

Vom (ys Ve)s (Uys Os (Vay 0) 

The grid node responses following t = 8.8 seconds are the avalanche's 
learned prediction of the pattern ellicited by the excitement of the 
starting command node alone at t = 7.9 seconds, 

As cae be seen, the avalanche's prediction is not an unqualified 
success, Of course & is too small to approximate the input pulses 
with any degree of accuracy. Nonetheless, grid node Me did respond 
with two large amplitude responses in a row and grid node Vo responded 
with large responses spaced 27 apart as in the input pattern. However, 


the third response of x, (t) and the second response of x, (t) shot 


1OF 





that the aveianche has noticable "picture over leppine" error problens, 
Increasing a and/or using thresholds would result in a better approx | 


imation. 


110 


ection 5.3 A laterally Inhibiting Avalanche 


Although the results with 2 simple avalanche were not encour- 
aging, equations 5.1 were modified to produce a laterally inhibiting 
avalanche for comparison. To convert a simple avalanche to a laterally 
inhibiting one, inhibiting directed edges between the grid nodes must 


be added and eguztion 5.1 changed to: 


il 


eel x Ct) 
503.2 x(t) 


~ax (t) + P(t) 
Cc Cc 


~ax,(t) + Ps(t) + PR agitate > jt) - 
~ ~\| + 
p ee aid )| 


5.33 8, (4) = -uegs(t) + v Pxy(bleg(t = 30) 


where: 
ot. yifyvr> 0 

[y] | 

Oi yee 

Figure 5.3.1 shows the results of performing the experiment of | 
section 5.2 on a laterally inhibiting avalanche, The parameters used 
in this experiment were the same as those in section 5.2 except that 
v = 2.4 as in the study of the laterally inhibiting outstar, T and 
p ~ are the same as in that study: 

*”~ =0.1 sec. 

Bo 


The prediction response of the grid nodes following t = 8.8 


2638 


seconds in figure 5.3.1 shows that the pattern learned by the avalanche 
is definitely not the pattern taught to it. Briefly analyzing the 
reasons for this failure, we can seo that the deleterious effects of 
lateral inhibition all acted in concert, eee the fact that 
lateral inhibition diminishes the amplitude of a node's Seeesnse 


17 








OF ie ces 4" 5 16° 7 eo oe | OF tT Viz 
TIME (secs) 


Figure 5.3.1. Results of an experinent with a laterally inhibiting 


outstar avalancho, 


ice 





when more than one node is excited at the same time resulted in 
responses to presentation of both of the everts at the ais time at 
the beginning of the pattern being diminished. This resulted in a 
smaller correlation amplitude for 244(t) and a, ,(t) when compared to 
Zn4 (t) which was the result of the wninhibited response to the pre- 
sentation of event 1 alone as the second event of the pattern, 

The first two responses of the prediction response of x, Ct, show 
this effect. 

Secondly, the lengthening of the negative amplitude inhibitions 
responses due to transmittal throvgh several nodes resulted in a large 
inhibitory response in x,(t) when event 2 was presented alono as the 
third event of the pattern. This resulted in a small correlation 
amplitude for Zao(t) which was insufficient to drive x (t) positive 
in the prediction at the appropriate time. 

Additionally, the errors associated with “picture over lapping" | 
combined with the above resulted in x, (t) responding to a third event 
that was not in the pattern, | 

. If an attempt were made to improve the laterally inhibiting 
avalanche's Mer Romaices (3 should be reduced, It is noted that if the 
pattern had been conposed on the average of a large number of events 
at each sampling with only a few events changing between pene the 
amplitude diminishing effect would not have been as serious. Due 
to the large number of nodes in such a pattern, the resistance to 
random mistakes composed of a small number of events would not be 
compromised with a smaller 2. 

Both to avoid the inhibitory response lengthening and "picture 


over lappine" errors, the interval botween samples, T , should be 
105 





increased. Of course this last suggestion seriously coupromises the 
ability of a laterally inhibiting outstar to accurately approximate 
a rapidly varying pattern, Thus solution of the response 
lengthening problem of a signal that must be transnuitted through 
several nodes is important. A solution will be proposed in a later 
chapter. 

The avalanches presented in this chapter were for illustrative 
purposes to show some of the problems peeernerre wnen outstars are 
combined into an avalanche, Rather than dwelling upon the design 
improvements which could be made to the avalanches, we will go on 
to consider other formulations of outstars which are the basic con- 


ponents of an avalanche, 


{1 Y- 








CHAPTER 6 THE VIRTUAL LATERALLY INHIBITING OUTSTAR 
section 6-4 Other Cutstars Which Control the Maximum Amplitudes 


of Grid Node Responses 


Lateral inhibition was added to tho sinple outstar as a means 
of using past experience to suppress random mistakes in a pattern, 
Its addition was necessitated by the rapid forgetting rate required 
to control the amplitudes of prediction responses, There are methods 
by which the amplitudes of prediction responses can be controled 
other than by allowing a fast forgetting rate, We will review a few 
of them as illustrations of different formulations of the equations 
for an outstar and then investigate one of then. 

One method of controling the amplitudes of prediction responses 


would be to place an upper bound on the 2 processes: 


H 


. 2 “ 
6.J.1 x(t) = -ax (t) + P(t) 
6.4.2 x(t) = ax, (t) + P(t) + azi.(t)x(t - 7 ) 

a ped i i +f . 
Sees )6= 2 (t) = wus g(t) + [N+ Z.4(t)| vx, (t)x (t - 7 ) 
where: ey tly 

i) = | 
GO) Sat ime 

Equation 6.1.3 limits 2..(t) to values between 0 and. M, is 
specified such that AM,x(t -7 ) produces the maximum grid response 
amplitude we are willing to tolerate, This method has limited random 
mistake resistance, However, if we specify v such that it requires 
several presentations of a pattern to drive a z process to Mo» then 


the occurance of one random mistake will result in a velatively small 


Z amplitude. If u is specified to result in a memory decay time i/u 


/15 





approximately equal to the average time interval between consecutive 
occurances of the same random mistake, then equations 6.1.1 throuch 
6.1.3 describe an outstar which has a relatively slow forgetting rate 
and amplitude control of the grid node responses, Howsver, if an 
outstar governed by this set of equations is confronted with a random 
mistake and is then asked to predict the pattern rapidly for a 
prolonged period, we can expect the pumping up process to saturate all 
the 2 process at value Moe including the : process associated with 
the mistake. Thus upper bounding the 2 processes to insure that 
the amplitudes of prediction responses remain tolerable is not very 
useful for an outstar pane ctonine in a noisy environment, Additionally, 
we could expect that use of a small u would result in poor ee, 
bility as in the simple outstar. 

A more direct method of controling the amplitudes of predictions 
responses would be to upper bound the grid x processes: ~~. 
6.1.4 x(t) = ~ax (t) +P Ct) 


~axs(t) + [m, ~ x,(t)}"Py) + p 2, (tx (t -7 )) 


i 


6.14.5 x(t) 
61.6 2 ,,(t) = -uz,.(t) + va (t -1 dx, Ct) 

By specifying u in equation 6.1.6 to be small, the outstar gov- 
erned by equations 6.1.4 through 6.1.6 would be able to absorb random 
mistakes in its experience as did the simple oulleveplaitth a slow 
-forgetting rate in chapter three. The bound on the grid node's x 
processes in equation 6.1.5 insures that this outstar will not have 
the uncontroled growth of prediction responses that the slowly 
forgetting simple outstar did. However, in this outstar, a large 
z 4 it) would’ result in a maximun motor input signel to a grid 


e r ¢ > 
node of magnitude M for as long as 244 (t)x Ct ee = Moe Because 
Z 
116 





the prediction signals have exponentially decaying tails, this would 
result in the effective duration of the maximun essai acewesa al 
input getting lonser as the Z4(t) process got longer, Thus while 
being able to control the amplitude of grid node prediction responses, 
we would not be able to control the duration of the responses, In an 
outstar, we have absolute control over the shape and emplitude of the 
prediction signal x(t -% ) by control of the input pulse to the 
command node. Thus by specifying the Snput pulses we can analytically 
compute what the prediction signal looks like, With this knowledge, 

a threshold [} could be placed on the command node to guarantee that 
the prediction signal [x,t -7 ) -7 |* is non zero only over a 
specified interval of time. By so restricting the duration of the 
prediction signal we could also limit the duration of the grid node's 
prediction responses, Again the small u resulting in good random 
mistake resistance could be expected to result in poor correctability. 
The properties of such an outstar would be interesting to investigate 
but time did not allow an investigation jn this study. 

Another method of controling the grid node prediction response 
amplitude which we will study would be to make the prediction input 
signal to the grid nodes linearly proportional to the probabilities 
y(t) which define the ovtstar's menory of a pattern, Beene outstar 
theorem, the y,(t) converge to the pattern probabilities Q, which 
are constant. Thus when the y(t) have converged sufficiently close 
to the 6, He could expect the prediction signal inputs to the grid 
nodes Ay, (t)x (t - t )-to be the same independent of the amplitudes 
of the Z 46%) processes, As ye) ale specifying 3 would determine 


the maximum possible prediction anplitudoe of the grid node's responses, 
Pp E 
117 





Additionally specifying the u of the a processes to be small would 
allow absorption of random miatakes in accumvlated past experience, 


The equations for such an outstar are: 


WW 


eyl.7 x(t) - ax (t) + P Ct) 


it 


6.1.8 x, (t) ~ax.(t) + P(t) + pv Ct)x ~?t ) 
6.1.9 2 (t) = -uzgs(t) + va (t - t )x (t) 
ci , 7 

6.1.40 y.(t) =2..(t) 7 ( 2 Z_,(t)) 
Another attractive property of an outstar governed by these equations 
is that equal prediction signals will result in equal grid node re- 
sponses independent of the amplitudes of the 2 processes, Thus we 
could say that the menory of a pattern is always fresh in such an 
outstar's menory and pumping up is not required, 

A close examination of equations 6.1.7 through 6.1.10 shows 
that an outstar governed by these equations is a laterally inhibiting 
outstar, By lateral inhibition we mean the ability of a grid node ~ 
responding with large amplitude to diminish the amplitude of grid 
nodes responding with lesser amplitudes. From equation 6.1.9, a grid 
node responding with a large amplitude will result in a large correl- 
ating amplitude for tho associated z(t) process, This will result 
in a large probability y, (+) from equation 6.1.10 which in turn will 
allow a arees prediction signal input in equation 6.1.8. At the 
same time a large z 6b) will result in a smaller y ;(t) for nodes 
not responding with large emplitudes by the inclusion of z(t) in 
the denominator of equation 6.1.10 for y(t). This in turn will result 
in a smaller input prediction Signal in equation 6.1.8 for a ei 


As can be secn from equation 6.1.10, the accumulated past experience 


of the outstar in the 2 4 (t) processes plays.a mojor part in this 
Hs 





lateral inhibition and thus the past experience can be counted upon 
to inhibit the effects of a random mistake, An ouvtstar governed by | 
these equations combines absorption of random mistakes and active in- : 
hibition of then, | 

The major drawback of such an outstar is that it is not consistent 
with the elements of embedding field theory presented in chapter one, 
Their neat geometric elements performing one function each were pre- 
sented, Because the y,(t)'s perform the prediction signal anplereatven 
function for this outstar, they should be located in the arrowheads 
of the directed edges with the z processos, This raises the proolen 
of how the 2,,(t)'s from each of the arrowheads of directed edges 
from the command node are made simultaneously available at all the 
arrowheads to form the y,(t)'s. We have constrained all other 
information transmissions in the outstar to finite velocities along 
directed edges. Because the 2 ,(t)'s are instantaneously available - 
at all the arrowheads without any apparent means of traveling between 
the arrowheads, tho y,(t) is a virtual process, The outstar described 
by equations 6.1.7 through 6,1.10 is there fore called a virtual lat- 
erally inhibiting outstar. | 

Although the virtual oe) process is not consistent with the 
elements of embedding field networks presented in chapter one, we 
will study the performance of a virtual laterally inhibiting outstar. 
Grossberg has done Ponsidarebie theoretical work mee (Ref. 7) 
In the realm of theory, there is no reason why a virtual process should 
be excluded from consideration. A virtual process doos not present | 
any difficulties to a digital simulation either. Moroover, if we 


4 


were to build clectrical devices to make an outstar with, we would 
119 








have mors trouble engineering the transmission delays for prediction 
signals than engineering the virtuel y,(t) processes. The only 

Place wnere the vircual nrocesaes are clearly ‘inapplicable is in the 
nervous system of living organisms where all information transmissions 
from one point in the system to another are at a finite velocity. 
Whereas a virtual laterally inhibiting outstar is not useful as a 


model for nervous systems, it is a legitimate device for study, 


[Ae 





section 6,2 Specifying the Parameters in a Virtual Laterally 


Inhibiting Outstar 


We will perform the same experiment on a virtual laterally inhib- 
iting oulstar as has already been performed on the simple and laterally 
inhibiting outstars, Thorefore the parameters of the virtual laterally 
inhibiting outstar are specified to be the same as in the other outstars 
except where there are special considerations to be made: 

Input parameters: | 

A= 10 

& = 1/K = 0.3 sec. 

Network parameters: 


= 
343333 SeCe 


a = 
T= 0,3 sec, 
N = 3 


Initial conditions on X,, anicwouen Xo are Zeros 

Selection of 8, u, v, and the initial conditions on the z processes 
will require some discussion. 

As the y(t) are ratios of Ce to the sum of all ee wo 
want at least one of the 2 | to have a non zero initial condition to 
avoid the problem of ane by zero, Tha initial value should not 
be too large to avoid biasing the network at the beginning of the 
experiment. Therefore at least one z.4 Will be specified to have an 
initial condition of 0.4. Again, to prevent biasing of the network 
in favor of predicting any one grid event, all the y,(%) shovld be 
approximately equal, This accomplished if the initial sondaeons on 


all the z . are equal. 
ci 


Weta) 





Therefore: 
2 (0)=0.4 fori=i1,. 2,3 | 
Cl 


Notice that this means that there is a non zero initiel condition on the 
y,(t)'s: 

y,(0) = 096333. forme renee 
This means that the precidtion signal at the beginning of the experiment 
is split up evenly between all the nodes in the grid. A prediction 
made in the initial state of the experiment will ei aod aod 
nodes responding equally. We must accordingly modify our interpretation 
of what grid node responses mean. Heretofore we have considered the 
outstar to be in a state of complete ignorance at the beginning of 
an experiment. In the simple and laterally inhibiting outstars this 
state of initial ignorance was specified by making the initial con- 
ditions on the z processes zero, A prediction by one of those out- 
stars while it was in its initial state resulted in no response of the 
grid nodes. Thus we were able to re-enforce ovr interpretation of 
aaa) ignorance by saying that there was nothing in the outstar's 
memory and the outstar could predict nothing. The virtuel laterally 
inhibiting outstar does not have this nicety. 

We will interpret the prediction responses of a laterally 
Ppadostine outstar to indicate OE ailaenoranied if all grid nodes respond 
with the same amplitude. Equivalently, total ignorance is the state 
in which all y(t) are equal. Note that this interpretation means 
that the pattern composed of all the evonts represented by nodes in © 
the grid is not perceivable by the outstar, Excitation of all grid . 
nodes will result in the sanie values of the y(t) as they have initially. 


This is equivalent to saying that white light is the same as complete 
fee 





darkness in this outstar. Thus an intelligible pattern must be composed 
of fewer than N events. In our experiment the pattern is composed 
of one event out of ates and thus is intelligible, 

In previous outstars, v has been selected on a so many presenta- 
tions mean well learning criteria, This egies to the fact that the 
prediction signal emplification o888as, the z(t), nad to grow to 
& certain amplitude before a prediction would drive the grid nodes to 
the same amplitudes as presentation of the pattern oxternally wovld 
See then, In the virtual Yaterally inhibiting outstar, this criteria 
for v is meaningless, The prediction signal amplification processes 
are the y,(t) which by the outstar theorem are always less than or 
equal to unity no matter what the amplitudes of the z processes are, 
Thus small amplitude z processes will result in the same amplitude grid 
node responses as large amplitude z processes as long as the ratios 
z(t) | z 2 q(t)] remain the same. Thus specification of v has ~~ 
nothing BS ao with the amplitude of grid node responses. 

3» on the other hand, has a great deal to do with the amplitude 
of the grid node responses, In previous outstars we have tried to 
control the grid node responses so thet their amplitudes during a pre- 
diction were approximately equivalent to thoso attained by excitenent 
by an event. As v can not be used for that purpose in this outstar, 
we will use (3 . With this intention, we run into the usual problem 
with our outstar posséssing some form of lateral inhibition, That is, 
we would like to know how many events on the average compose a pattern, 
In a laterally inhibiting outstar we saw that a ~ selected for an 


average of a small number of events in a pattern resulted in inefficient 


feo 





learning of a pattern composed of many more events, Nevertheless, 
with sufficient instruction and/or predictions, the econ 
inhibiting outstar is able to "oll learn" a pattern more conplicated 
than it was designed to learn, 

In the virtual laterally inhibiting outstar, we do not have this 
possibility for well learning a pattern more complicated than ones 
the network is designed to learn. If we have M <N events on the 
average in a pattern, then the expected value for the y,(t) correspond-= 
ing to events ‘iin a pattern is y(t) = 1/M after learning has occured, 
The y(t) for events not in the leamed pattern are small. Now, we 
can specify (3 such that: 

= bi 
where b is a constant necessary to result in a woll learned grid 
prediction response for a pattern composed of one event. With this 

(3 » the input prediction signal to a node representing an event in 
the learned pattern is: 

y, (t)px,(t o%) = (1/M)blix (+ -%) = bx (t -7) 
and thus we get well learned responses. 

However, if thero are fewer than M events in the pattern learned, 
the prediction responses will bo larger, Jf there are more than M events 
in the patter learned, the prediction responses will be smaller. 
Because the y(t) do not change onee the pattern is learned, there is 
no possibility of changing this situation. 

Thus the well learning criteria is an vnrealistic requirenent 
for a virtual lateraly inhibiting outstar that is confronted with the 
possibility of learning a wide varioty of patterns, The well learning 


criteria was originally introduced because we adopted the convention 
Vea 





of reading the amplitudes of the x processes at tho nodes as the 
response of a node. As the measurement of very small or very large 
amplitudes was Re aicaas the well learning criteria was adopted as 
a measvrement standard. For the virtual laterally inhibiting outstar 
we could devise another virtual process to interpret grid node responses, 
For instance, the probabilities: 

N -1 

x(t) = x) Z, x,()] 
would be suitable. However, as the pattern we will teach the outstar 
in this experiment iis simple and we know that it will be composed 
of at most one event, we can retain the well learning criteria for 
interpretation. In a mora general situation the above discussion 
must be considered. 

Since we are going to teach the outstar a pattern composed of 
at most one event, and we are going to specify /3 according to tho well 
learning criteria, we can make a quick estimation of what f should 7 
be: 

The input prediction signal to the grid node corresponding to the 
event in the pattern should have & maximum amplitude equivalent to the 
maximun amplitude of an input pulse: 

(ry. &t)(mar x(t =-C)) =A 

For one event in the pattern, y,(%) = 1,0 after learning. Thero- 


fore we want: 
oh 


pmax xy(t -%)) = (Afa MA = 0°) = (Afar A = et) F 0,63(A/0«) 
ors 
= %/,063 = 5.28 


Experinentally the appropriate valve of 3 was found to be: 


Cael 
ae 





The 11% error is due to both the naiveté of the estimation and the error 
inherent in the digital simulation. 

Having se ghee A » we will specify v to be cqual to B arbitrar- 
poly: 

am (Sue the 7.7 

Only u remains to be specified. Since st is claimed that a virtual 
laterally inhibiting outstar can use the large z's resulting from a 
small u to absorb random mistakes, we will specify u to be small. 

u = 0,01 sec, 

Note again that a small uw means that the decay time of the z process 
1/u is large compared to the presentation and/or prediction interval 


to be used in the experiment. 


VG 





section 6,3 Results of the Experiments with a Virtual Laterally 


Inhibiting Outstar 


Figure 6.3.1 shows the results of a the pattern Var 
to the virtual laterally inhibiting outstar twice and then asking for 
@ prediction of the pattern. As can be seen from the x, Ct) trace, 
a was well learned. eas was learned slightly due to the 
prediction signal's "tail". (Event 3 was presented with presentation 
phase 9 =+2§ with respoct to the prediction signil.) Also note 
that x, (t) ore to prediction slightly although event 1 has not 
. been presented to the ouvutstar, 

Looking at the y, (t) traces in figure 6.3.1 we can see why. All 
three y (t) started with the same Retial values y,(t) = 0.3333 for 
i =i, 2, 3. The first presentation of the pattern resulted in yo(%) 
rising to a maximum value of nearly 0.8 while y, (t ) end y ght) decreased 
to about 0.1 each, ‘When event 3 was presented 25 after a. ee 
y, (t) changed slightly due to correlation between the prediction signal's 
tail and x(t). Note that on the second presentation of the pattern, 
y(t) decreased again and ye) increased, According to the outstar 
theorem, more presentations of the pattern because of correlation between 
the tail of the prediction signal and 7 (oe However’, in. the two 
presentations in figure 6.3.1 y(t) ales Sas large enough to allow some 
prediction signal through to =e aC 

If we remember that it was agreed to interpret an equal response 
from each of the grid nodes as no responso, then we can place imaginary 
thresholds, Ti+ on ane x, Ct) traces. i, showm in way of the third 
response on the x, (t) trace was chosen sueh that if y,(t) = 0.3333 for 


TEA, 


Oe 


20— 


“a 





Bi) 
X(t) 
ay 
“ea 
Z(t) 
C3 
Z,,(t) J 


Tt 


é 


Y2 (i), 
y(t) 


b. 


Ys (i) 
fo 


TIME (SEC) 





Figure 6.3.1. 


Results of teaching a virtual laterally inhibiting 


outstar a pattern, 





ge 1, 2, 3 in the outstar, all grid node prediction responses would be 
subthreshold, Thus, by interpreting a node as not respondine until ! 
jt is suprathreshold, we can interpret the results in figure 6.3.1 | 
as saying that oe a was learned by the outstar. The results 

of performing an experiment on a virtual laterzlly inhibiting outstar 
with real, versus imaginary, thresholds will be reported later in this 
chapter, | 

Of interest is the fact that the y, (t) did not change during the 
prediction. This was an outstar theorem guarantee which is now 
experimentally verified, 

Figure 6.3.2 shows the results of continuing the experiment. A 
simulated random nistake was presented with the pattern by presenting 
event i at the same time as event 2 was presented, Note that on 
subsequent predictions, x, (t) remained subthreshold, It can be concluded 
that this virtual laterally inhibiting outstar is resistant to randon 
Paletakes. However, looking at the aay traces, it can be seen that the 
random mistake did reduce y,(t) and this effect persisted through 
subsequent predictions. Thus, even though the prediction responses 
of 7? are subthreshold, the y(t) remember the mistake. It will take 
several presentations of the correct pattern to undo the effect of the 
random mistake, In the discussion of using large amplitude z processes 
to absorb mistakes in section 3.4 it was show that the 2 processes 
would reflect the conditional probabilities oe Up to the end of 
the experiment in figure 6.3.2, the c event has been presented 6 times. 
Event 2 has been presented 3 times and event i has been presented 1 tine. 
Using the past history of the eee of the events in the environment 


to estimate the conditional probabilities PRY and pr we get: 
Cc 


129 2/c 





*OYBYSTU wopuBr su} Se VVTAUTS (a) 4 
on peurseT A[snopaoud svy YyOTUM Aeysyno ZupypqpuuT ATT e104zeT 


TONPATA B UO ONBYSTW WopUBL peysTnUTs wv Jo jyoossq “2°e°g ounsyy 
i 


(S99S) SWIL 
aon oe ie ee aC lf. OC 


(sodas) SWIL 
9. S BV Scans) me 











ie 
a6) 





“2 





a) xt} a)“ 


) _\gl- “a 


ow 





PR J = 1/6 = 0.1666 

Ie = 3/6 = 0.5 
The ratio PR, fe ye PR 7, = 00,3333. At the end of the experiment in 
figure 6.3.2, y, (t) = 0.15 and yo(t) = 0,6666, The ratio y,(t)/ y(t) 
is: 

y(t) / y(t) = (0,15)/(0.6666) = 0.225 

As the y, (t) are directly proportional to the Za ht)s the above 
calculations show that the virtual laterally inhibiting ovutstar is | 
more resistant to random mistakes than would be expected if iv were just 
using large anplitude z processes to absorb mistakes, On the other hand, 
we can show expect the large z's to reflect the statistics of the 
enviroment some what and the inhibitory mechanism of the outstar is 
not sufficient to completely overcome this. Thus some effect on the 
yft)'s must be expected from the statistics of the environment. 

Figure 6.3.3 shows the results of continuing the experiment and 
trying to correct the learned pattern Vm ue with the pattern ae 
by presenting event 1 three times in a row, As can be seen, the 
correction attempt was not successful. Looking at the zt) traces, 
it can be seen thit the past accwmlated experience of Ved in the 
large Bolt) is so great that although the accumulated experience of 
ee Ys ard 24 Wt) is increasing, it wild require many nore presentations 
or cat’, to say that the outstar hee corrected the mistake, This was 
a phenomena noticed in the slowly forgetting simple outstar alse, 
Even though this outstar does laterally inhibit, it is not surprising 
that a large amount of experience with a pattern will make it difficult 


to convince the outstar that the pattern is a mistake, In order to 


13 





(S998S) SWIL 


| Oy Se Va cml O 
(S99S) JWIL a) x (1) *d 
9 G bv € 2 |_ 
— Ol 
ek yee 
2 : 7 
(4) “A 0" 
W)°x () 
oO) 
|__| a aa ee 01 | 
7 | 
(4) Oc (iex (i) d 
: =. (0) 
(1) Z OG —|2-- 
; OZ 
oe 
Wee ay?x|! ay 
Ol=V 


° 
ob aj usozqed CY} YZTM pezoTLOS guzeq st CASA 
UISYZed PoUrBeT ATsnopaesad oy] °4BqzsS NO Bupypaquuur ATTRL97eT [ENYIFA B 


uy useyqed pourveT ATsnopacid B JoerT00 04 ydueyy “C°C°9 einst gy 


ae 





improve the virtual laterally inhibiting outstar's correctability, 


the forgetting rate u will have to be decreased. 


155 


section 6.4 A Virtual Laterally Inhibiting Outstar with Thresholds 
and an Intermediate Forgetting Rate Designed to Learn | 
Patterns of More than One Event 3 

In the previous section, it was concluded that the addition of 
thresholds to a virtual laterally inhibiting outstar would be an aide 
to the interpretation of responses, It was also concluded that a faster 
forgetting rate would increase correctability. In this section ,we will 
test these conclusions, Additionally, it would be instructive to see 
what happens when the pattern being taught to the outster is composed 
of more than one event. 

In order to have sufficient possibilities available to study 
teaching an outstar a pattern composed of more than one event, the nunber 
of grid nodes, N, will be ee ceeen to N= 5, We will specify é to 
result in a well learned response for patterns composed of an average 
M = 2.5 events. The input pulse parameters; the x process rise rate, %, 
and the transmission delay, vt , will be kept the same as in section 
6.2. The following parameters are therefore specified: 

Rectangularly shaped input pulses: 

A=i10 

5 = 0,3 sect 

OO= 343333 cee Smears 

= 0.3 sec. 


‘Since thresholds are to be added to the outstar, the equations 


governing its performance will have to be changed: 


ier 





peek |= ox (t) = =ax (t) + PB (4) 

Cc e Cc 
662 H(t) = a(t) + PACE) + y (phe -v) - 1)" 
6.4.3 2 .(t) = uz, (t) * v [x(t 7) =) ee 
6 y(t) = 24%) | 2: 24(%) J 

Now we are faced with the problem of assigning values to the 
thresholds les and nz In section 3.5 it was coneluded that putting 
thresholds on the grid node x processes of a simple outstar was in- 
advisable because this would result in eventual extinction of all memory, 
This was due to the fact that the z processes decayed exponentially 
at the rate ue. It was quite possible for the z's to decay until the 
predictions input signal Z(t) Lx (t Se pe * 40 the erid 
nodes is unable to drive the grid node x process suprathreshold. In 
this situation the outstar could no longer “pump up" the z process 
because the correlating signal v[x,(t -T) - ry [x, (t) - ry 
would be zero. However, in the virtual laterally inhibiting outstar, 
we do not have this problem. The prediction signal amplification processes 
are the y(t) which do not decay. Thus we may specify a non zero le 
in equation 6.4.3. 

In fact, use of a grid node threshold is advantageous in a virtual 
laterally inhibiting outstar, Beside the interpretive advantage dis- 
cussed - section 6,3, there is a real improvement of performance. 

Since the convention for interpreting the responses of a virtual laterally 
inhibiting ovutstar says that equal responses by all the nodes in the 

erid is a state of total ignorance, we have specified equal initial 
conditions on the y,(t)'s. eis. y, (t) = (1/N) for all i. Now 

suppose that we have have a virtual laterally inhibiting outstar ‘in 


2 state of total ignorance. This means that we have not presented 
135 





an intelligible pattern of grid events with the conmand event. However 
it does not mean that the command event alone has not been presented 
onthe outstar., In Reade until we decide to teach the outstar that the 
command event is associated with an intelligible pattern, we may excite 
the contnand node as many times as we like. Because the prediction 
Signal so generated is being split up evenly between the grid nodes, the 
y(t) will not deviate from a state indicative of total ignorance, 
However, the correlating signal vx (t ~ t )x,(t) will become positive 

on each such fenorent prediction and the z(t) will grow. We had 

great difficulty correcting a learned mistake in section 6.3 because 
the esperiience with the erroneous pattern was great, If the outstar 

is allowed to accumulate experience with the ignorant pattern by spurious 
excitenents of the command node, then it will be equally difficult 

to correct the ignorant pattern with an intelligible one. 

Of course, increasing the forgetting rate should partially 
alleviate this problem. However, it would be better to prevent the 
outstar from accumulating experience with the ignorant pattern altogether, . 
A properly selected grid node threshold T_ would achieve this result. 
In the state of initiel ignorance, the amplitude of precsedon site 
inputs to the grid nodes is: 

( Pix ( -~T) 
as y,(t) = 1/N for all i. Suppose p has been specified to result in 
a well learned response for an averare of M < N events to a pattern. 
Then the ignorant state input prediction signal is: 

(bM) /(N) x Ct a) 

Where b is 2 constant which results in a well learned response from a 


grid node when bx,(t - 7) is the prediction input signal, Now a well 
[56 . 


learned prediction response is one in which the maximun emplitude 
of the response is equal to the meximm amplitude of a response elicited 
by an event input pulse alone. Knowing the shape, amplitude, and 
duration of the input pulses, the naschmum ‘amplitude of a well learned 
response can be analytically calculated. For the input pulses of this 
experiment, it is: 

x = max amplitude of well learned response = (A/a )(4. ae 
0.63(A/a ) 

Thus the proper dee to prevent accumulation of experience with the 
ignorant pattern may be analytically specified by: 
le = max amplitude of prediction of the ignorant pattern response = 

(M/N)(0.63 A/a ) 
Knowing that M = 2.5, N=5, A=10, %= 3,333: 
Y= 0.945 | 

Note that this Le will work only for the input pulses specified, -- 
Outstars are capable of learning patterns independent of the vigor 
which with they are presented. They are also capable of learning patterns 
composed of events presented at different strengths, Of course, in a 
threshold outstar, there is a minimum pulse amplitude A which will 
result in superthreshold responses and thus learning. In this study 
eerie aanta arenes nection tion slomier oampcennoles 
constant because a larze mmber of outstars are being studied, A 
detailed study of varying ce ‘input pulse specifications in each outstar 
requires a prohibitive amount of time. Jn an outstar functioning 
in an environment jin which events occur with varied amplitudes, a 
statistically average well learned rosponse covld be used to specify 


a os sufficient to prevent accumulation of experience with the 
LSof 





ignorant pattern on the average. However, this is not a study that will 
be undertaken in this paper. In this study we.are able ee 
know ahead of time the exact specifications of our input pulses and are 
consequently able to specify the eee ners os the outstars to result 
in the preformance we want, 
Unfortunately, tho above analytic method was not completely 
understood at the time the experiment being reported was performed. 

‘ = 0,45 was used and consequently the outstar was able to accumulate 
experience with the ignorant pattern. Rather than re-perform the 
Samerinent with tho "correct" oat it was decided to present the data 
| collected with the ‘pone! Th it illustrates the problem of accun- 
vlating experience with the ignorant pattern, Additionally, exanination 
of the data will reveal that there are other properties associated with 
any non zero VL. which are of more consequence than the propaorty of 
preventing accumulation of experience with the ignorant pattern. 

It was decided to specify the command node threshold, 1 o such 
that there would be no correlation with events presented with presenta~ 
tion phase P® greater than P= § = 0.3 seconds, From previous experi- 
mental data, We = 1,0 will satisfy this criteria. | 

Addition of a non zero a made the analytical specification of 

é too difficult, Thus a p resulting in a well ee response 
for a pattern composed of M = 2.5 events was experinentally determined. 
The value so deterzvined was: 

A = 27.9 | 

u was increased to test the conclusion that a. faster forgetting 
rate would result in improved correctability. The interval between 


presentations and/or predictions is 1.8 seconds which is the same 
138 





as in previous experiments, Part of the reason for introducing the 
virtual laterally inhibiting outstar was to use the accumulation or 
experience with a small u to aide in resisting random mistakes by 
absorption, Therefore we will not make u so small as to completely 
destroy this effect. A decay time of twice the interval between 
successive predictions and/or presentations was selected: 

me 0.278 hy = 4/(2x1.8 sec.) 

v was arbitrarily specified to be v = 10, 

Since a pattern composed of M = 2.5 events is impossible, it was 
decided to teach the outstar a pattern composed of 3 events and then 
test its random mistake resistance. An additional event presented with 
presentation phase P= %26 = 0.6 seconds was included with this pattern 
to illustrate the effect of the command node threshold, After this part 
of the experiment it was decided to attempt correction of the pattern 
with a pattern composed of M = 2 events. It was decided to make the 
correcting pattern to consist of an event not included in the original 
pattern and an event that was included in the original pattern, The 
reason for this selection of correcting events was to ses if there ‘is 
any difficulty in learning that only oes of a previously learned pattern 
is in error. 

Before beginning to teach the outstar an intelligible pattern, 

a@ prediction of the ignorant pattern was gotten by excitement of the 
command node 2lone, This was intwialiy done vo demonstrate that a 
properly selected V. would prevent accumulation of experience with the 
ignorant pattern. Because of the error in specifying Vo it serves 

as a demonstration that accumulation of experience with the ignorant 


pattern is a factor to be considered. 
eg 





The foregoing discussion is summarized in the box below: 


Equations governing performance of the outstar: 


z(t) = - ax, Ct) aor st) 
z(t) = - Ox, aa aes (t) Lx Core) al 5 cs ,{t) 
act) = aight) * viet td = Teh xt) - TT 
7... seat z= Zo eo) 
where: 
ryt =f! for y 7 0 
0 for y = 0 


Input parameters: 

pulse shape is rectangular 
A = 10 

6 


0.3 seconds 


i 


Network parameters: 

A= 3.3333 ae = 1/8 
B= 27.9 

T= 0.3 seconds 

i= 1.0 

T= 0.45 


HH 


wu = 0.278 ea =41/(2x1.8 SCCe ) 
v= 10 . 
Tnitial conditions: 


x (0) 
c 


x, (0) =0 for all i 


0 


i 


Z__60) (VM form clea 


and: y, (0) = 042) tonal. 2 
140 





section 6.5 An Experiment with a Virtual Laterally Inhibiting 
Outstar with Thresholds and an Intermediate 
Forgetting Rate Designed to Learn Patterns of More 


than One Event 


Figure 6.5.1 shows the first phase of the experiment described 
in the previous section. The first response on the five grid node x 
process traces isa prediction of the ignorant pattern elicited by 
excitement of the command node alone. The 2 brace Lor aligiive zt) 
shows the experience accumulated by this prediction. Although increase 
in amplitude of the z2 processes due to this single prediction is small, 
many such predictions would result in an accumulation. Even this small 
accumulation of experience with the ignorant pattern affects the 


performance of the outstar when the pattern V-~ (V,, V V3) is pre- 
C 


a3 

sented to the outstar as is shown by the y (t) traces, One presentation 

of. the pattern is insufficient to result _ convergence of the y, t) 

to values describing the pattern and a second presentation is required, 
Even though the grid node threshold i is too small to prevent 

accumulation of experience with the ignorant pattern it does improve 

the learning performance of the outstar. Looking at the x(t) trace 

it can be seen that the first presentation of the pattern resulted in 

a redistribution of the values for the y,(t). This redistribution 

was sufficient to prevent x_(t) from going Eipratheoencla long enough 

to add any appreciable amplitude to Z(t) on the second presentation 

of the pattern, Due to the reasonably rapid forgetting rate ae Z(t) 

continued its decay during the second presentation. With oC) SO 


small that xp (t) can not be driven suprathreshold, future presentations 


tt 





ieZ 


15h 


Ze), Zea, & Ze, 
20 Cist4s 262 é yh 






lO — 
Zeal) Zo5@) 
a Y(t), Ye (1), ¢ ¥5 (0 | 
05 Yq (t) oN 
[a f eal 
0 a : 6 


TIME (SEC) 





Result of teaching a virtual laterally inhibiting 


Figure 655a. 


1 V5 Va). 


outstar with thresholds the pattern V—>(V 


c 





and/or predictions will result in no further increases in the amplitude 
of ee This would be of particular importance if in the first 
two presentations of the pattern y(t), y(t), and. y(t) had not 
converged so closely to the final values describing the pattern of 
y(t) = 0,3333 for i=i1, 2, 3. For, if the y(t) wore not so close 
to their final valuos, then the prediction of the learned pattern would 
have resulted in furthur convergence of the y,(t)'s to this final 
value. The prediction of the learned pattern shown in the fourth 
response of the grid node x processes shows why. The prediction 
response for the nodes V,, Vos and V. included in the pattern are all 
suprathreshold and result in an increase in amplitude for the corres- 
ponding z_,it)'s. The prediction response for the nodes V, and Ve 
not included in the pattern are subthreshold and therefore do not 
result in increases ‘in the amplitudes of 2 y(t) and Batt) Thus | 
the y(t) continue to converge during predictions. However, the y, (t) 
converged so close to their final values in the two presentations of 
the pattern shown, that this effect can not be seen in figure 6.5.1. 
A higher resolution look at the y,(t) showed that y,(%), yp bt), and 
y(t) increased from 0.3096 to 0.3225 on this predicticn. This 
phenomena is not in contradiction to the outstar theorem which euaran- 
tees cole Gee the y,(@) will not diverge during a prediction. Con- 
vergence is therefore theoretically permissible and grid node thresholds 
result in convergence Rowan predictions, 

In figure 6.5.1, event 4 was presented 26 = 0.6 seconds after 
events 1, 2, and 3 in the pattern. The command node threshold Me 
was chosen to prevent any correlation with events presented more than 


6 = 0.3 seconds after arrival of the prediction signal at the arrowheads, 
[43 





18k 


et) 


A=\0 


X(t) 





‘a S Go 

esveioep (3) A yeyz puw Mod You seop (})” 2 4sYz EYOU OSTY 
*esuodsed uoTyoTpeid JupmoTTOs eyj uo pToyseiyyeidns od you seop (4)°x 
€ 


o 
FEU} OJON “ORBYSTM eYY ST G Quoag *(“AS“AS aan UlOzzTed PEUITET 


ATsnopaead ey} UT exe{sTW Wopuei we Bupyueseaid Jo syfmsey *z°S*g eunsty 


i 
| 
| 





uh 
TINE (SEC) 


Ae 
= ae a 
= > 
So “~~ 
oe _™ 
en = 
= aa ee SS 
a eas ~ Ia 
og N | = eel ae 
o> Sf ity 
s aa 
»~ 
inal CT 
OF oF Oo ww 
GN — ae © 


Mani 





The fact that y(t) and Zon he) are identical to yg(t) and Zot) shows 
that the comnand node threshold was successful. Presentation of 

event 4 resulted in a correlation ea uieeMent to no presentation at 
all. 

As ean be seen, the fp selected resulted in learned prediction 
responses for the three events in the pattern of approximately the 
same amplitudes as the response elicited ee input pulse alone. 
(Compare the maximum amplitudes of the prediction responses of x,(t), 
x(t), and x(t) with the maxinun amplitude of x(t). ) 

Figure 6.5.2 shows the continuation of the experincnt. The pattern 
; V3) is presented with a simulated random mistake. Event 


Z 
5 is this mistake. As can be seen the presentation of the random mistake 


mo (V,, V 
Cc 


resulted in a healthy increase in Za gbt)e However, this was insuffic- 
ient to drive y,(t) large enough to result in a suprathreshold xe(t) 
on prediction, Therefore Zaght) continues to decay on subsequent - —- 
predictions and is bound for extinction. A slight decrease ‘in ye(t) 
can be seen during the prediction response in figure 6.5.2. This is 
due to the prediction convergence phenomena described above, Thus we 
can conclude that more predictions will result in the y,(t) converging 
back to the values they had before the occurance of the random nistake, 
pepore 6.5.3 shows the results of continuing the experiment. The 
previously learned pattern oe 5 lee V,,).4s corrected by the pattern 
ee : Vide The difficulty with this correcting pattern is that 
event 1 is included ‘in both the original pattern and the correcting 
pattern. As can be seen, it only required four presentations of the 
correcting pattern to result in subthreshold x(t) and x4 (t) responses. 


Vo and ‘3 represent the events 2 and 3 which were part of the old 
145 





146 








ics 
= {0 
a Ra 
hoe x a Ty 
mS 
ae Ba) 
x 
7 aay 
iO — 
X 2 (t) 
er: 
iowa 
oo 
Xz {t) 
Dee = on | 
Sta a x 
10 — 
P(t 
Eel 
lo — 
Xe (1) 
ee ee Ty 
ee [etn lea a eee I 
Ui) 
SO — / 
20—- « Zoe 
Z2c5 lt) 
1_O— 
YX () Yu (t) 
0.5— y, : 


Ye (t) a ae Y2(t) )§ ¥;(t) 


lO le 
TIME (SEC) 


Correcting @ previously learned mistake. The previously 


Figure 6.503 


learned pattern is aa, V9). The correcting pattern is V4 V5). 





pattern, but are not included in the new pattern, Additionally, 
y(t) and y(t) have decreased iin these four presentations to the point 
where it can be pateie me cn emered that the dominant pattern is Vy ; Vy) 
This situation should be compared to the unsuccessful attempt to correct 
a pattern by three presentations of the nigmeetiee pattern in the 
virtual laterally inhibiting outstar with a slow forgetting rate shown 
in figure 6.3.3. It can be concluded that increasing the forgetting 
rate does improve the correctability of-a virtual laterally inhibiting 
outstar,. | | 

The final values for the y,(t)'s to describe the correcting pattern 


ares 


i 


Ht) = y,(t) = 0.5 

y(t) = y(t) ayo) —(e 

As can be seen, y, Ct) has slightly overshot its final value and 
y(t) has only reached a value of es 0.38. However y,() and yt) 
are converging toward each other. We may conclude that the previously 
accunulated experience with event 1, which is common to both patterns, 
is great enough to make convergence to the new pattern difficult, 

It should be noticed that the prediction responses of x, (&) and 
x, Ct) at the end of the Seton are both of greater amplitude than 
@ response to an input pulse alone. This is an effect of lateral in- 
hibition. In the old pattern of M = 3 events, the prediction response 
amplitudes of grid nodes associated with the pattern was slightly less than 
the amplitude of a response to an input pulse alone. @ had been speci- 
fied to result in a well learned response for a pattern consisting on 
the average of M = 2.5 events. Thus the 3 event pattern results in smaller 


than well learned prid node responses and the 2 event pattern results 
1a 








in larger than well. learned grid node responsese 


148 








CHAPTER 7 OTHER FORWULATIONS FOR THE z2 PROCESS 


section 7.1 Introduction 


In the discussion of the laterally inhibiting outstar it was 
memtioned that the outstar was excitory biased. The equation for the 
Zz processes in the laterally inhibiting outstar was: 


64.4% (t) = ung, (t) + v[x (t ~ ex, (t)]* 


C7. 
where: 
+ VeaAt yy 2 0 
a = 
Oeil yo = a0 


By excitory biasing, it was meant that the learning z processes could only 
assume non negative values, Thus the input prediction signal to a 
grid node, pz,,(tix Ct ~t) is always non negative and can not drive 
the grid node's x process to nezative emplitudes. In this way, the z 
processes are biased against learning to inhibit grid nodes and ara 
biased in favor of learning to excite then. ae. 

In this chapter we shall drop the excitory biasing restriction and 
conduct an investigation to see if there is any value in outstars 
which can learn to inhibit grid nodes as well as excite them by pre- 
diction signals from the command node. One reason for conducting this 
study is that in the laterally inhibiting outstar wé had to introduce 
a new element in the embedding field network elements. The inhibitory 
directed edges’ arrowheads contained z processes which were assigned the 
permanent value of -1, These 2 processes did not learn their vanes as do 
_the z processes in the other arrowheads ‘iin the network and we must 
consider a non learning 2 process to be a new feature. In the avalanche 
using a long axon and collatcorals wo avoided the use of z processes with 


permanent values of +1. If we solve the pulse lengthening problems of 
14-4 





the outstar avalanche, then we will have to use another new element. 
Development of a general formulation for z processes to cover all a 
processes would eliminate the need for making exceptions for special 
design feature in a network. We will attempt to formulate more general 
Z processes in this chapter. Throughout, we shall be speaking of 
embedding field networks which do not have any virtual processes 
associated with them. The networks we shall discuss conform to the 


embedding field elements of chapter one. 


150 





section 7.2 A Description of the States of the Processes in an 


Out star 


A 2 process at an arrowhead correlates the prediction signals 
arriving at ae arrownead and the x process at the node upon which the 
arrowhead impinges; and it remembers what the correlations in the past 
have been. The z process can therefore be considered to be a function 
of the past and current states of the adjacent node and the prediction 
signals, The z process itself can be thought of as being in various 
states. For instance, we can think of a eee anplitude z process 
as being in an excitory state as it allows largw prediction signals 
through to excite the adjacent node. Small amplitude z processes could 
be thought of as being in an unlearned or ignorant state. 

In this chapter we shall use this idea that z processes are in 
states which may be completely determined by the past history of the 
states of the prediction signal and the grid node x processes. ee oeae 
develop a state function AE, x,) which maps the states of the 
prediction signal x, and the grid node x process x, into a 2 process 
state zt 

A ., *,) " a 
feeett be found that this function & is a handy way to describe the 
logic behind the learning process in an outstar and for this reason 
we shall. call the state function f a "logic", However, before the 
usefulness of such a “logic"™ can be demonstrated, we must build up 
a description of the states of the various processes in an outstar, 


In outstars without virtual processes, we are concerned with four 


processes: 


i) 





1. Inpvts, P At) and P(t) 
2. Node x processes, x(t) and a | 
3. The prediction signal from the command nede, [x ( =e) et nae 
where le may be zero 
4, The z processes, z,,(t) 
Input pulses, PC) and ae have been used to indicate the occurance 
of events in the environment. There are two possible states for an 
cent, Either it is occuring, or it is not... We have transmitted 
information about whether an event is occuring or not to the outstar 
by the input pulse. A positive amplitude has been used to signify 
that an event is ey A zero amplitude has been used to signify 
that an event is not occuring. The following code can therefore 
describe the state of inputs and the state of the events they describe: 
(a) P=+4 indicates that an event is occuring and that the assoc- 


Se — ——— 


Aated input has a positive amplitude, 
(b) P = 0 indicates that an event is not occuring and that the 

associated input has a zero amplitude. | 

Node x processes have been used to signify the recent presentation 
of an event and/or a recent prediction of an event. A large positive 
amplitude has been interpreted as indicating that the outstar "thinks" 
that the event represented by the node ‘in question has occured ently 
or at least, should have occured recently. Small positive amplitudes, 
or zero amplitudes have been interpreted as indicating that the outstar 
is not "thinking" anything about the event Penrose ean, a node, 
Negative amplitudés have been Ant oteretee as indicating the same 


state as small or zero amplitudes, 


eye 


By placing thresholds on the nodes, we were able to precisely 
determine when an x process was of large enough positive amplitude to 
indicate that the outstar is "thinking" an event. With thresholds 
we may replace the word "large" in the preceding paragraph with the 
word "suprathreshold". Jn the same manner "small", "Zero", and 
"negative" may be replaced with "subthreshold", 

Thus we have two states for a node x process: 

(1) Bre = 1 indicates a state where the x process at a node is of 
sufficiently large positive amplitude, or is suprathreshold. This state 
corresponds to the interpretation that the outstar is "thinking" about 
the event represented by the node. 

(2) x, = 0 indicates a state where the x process at a node is of 
small or zero positive an eae or is subthreshold. This state corres- 
ponds to the interpretation that the outstar is not "thinking" about 
the event represented by the nodo,. i oe 

Althoush the notion "thinking" about corresponds to the psychologi- 
cal interpretation of x processes' amplitudes, it is clumsy, In the 
outstar, the only way an x process can get into the state x, = 1 is 
to respond to an input. That is, it must respond to excitement by an 
input pulse or an input prediction signal, or both. Thus we covld 
describe the state x, = 1 as “responding" or "excited", To avoid 
semantic difficulties, the state x, == i wall be called the Jexcit ed! 
state. | 

For ease reasons also, the state x, = 0 will not be called 
“not thinking" about. Although "not excited" would apply well to 
Be = QO, it will not be used either. Instead the state x,= O will be 


called "ambient", “Ambient" is used becnuse it refers to a state 
1453 





which is the usual state of an x process. The ambient state x, = 0 
is also the passive state to which an x process always returns, 
Further, it is the state of an : process when it is not being actively 
driven by signals from outside the node. Thus it was felt that "ambient" 
accurately describes the state x, =O) 

In the above listing of states for x processes, an x process 
responding with a negative amplitude was not included. Although 


we have followed the convention of interpreting negative amplitudes as 


being the same as ambient amplitudes, the inhibitory process that results 


in negative amplitudes is not an ambient process. A negative amplitude 


— 


ean be achieved only if the x process is being actively driven in the 
negative direction by signals from outside the node. It is therefore 
definitely not "ambient". There is no reason why our description of 
the states of x processes should have to conform with our interpretation 
of what those states mean. We will refer to an x process of negative 
Ppiiitude as being in the inhibited state and indicate this state by 

x, =-1, We will continue to interpret the state X = «4 as indicating 
the same interpretive state 28 xX, = 0, 

The difficulty with the inhibited state is that it is a subjective 
state within the ovtstar. In the environment the state of an event can 
be described as actively occuring or passively not Serer iotencte iiss 
no such thing as an event that actively does not occur, However, we 
saw that a practical simple outstar with only the two x process states 
of being excited or being ambient had very little resistance to random 
mistakes. We added lateral inhibition to allow the outstar an active 
process oes it could subjectively prevent events from occuring. 


Particularly, lateral inhibition was added to subjectively prevent 
154 


“ 


random mistakes from occuring in a previously Learned pattern. 

Suppose we had a black box that was claimed to be a learning machine. 
The only way we could determine if it was a learning machine is to 
teach it something and then see if it could reproduce what we taught it. 
We would only be able to observe the ovents we were teaching it and the 
box's response. Now, the box's response would be events to us, 

Thus from our point of view the only states the box could cormunicate 
to us would be the state of a response See or the state of a 
response not occuring. The state of a response somehow being able to 
not occur with greater vigor than simply not occuring is meaningless, 
Thus, our anterpretation of what an outstar is doing is limited to what 
we could observe if the outstar were a black box. 

We have used this interpretive convention and will continue to do 
so. However, an outstar is not a black box to us, We can observe all 
the processes occuring eae it. Thus we are confronted with the in- 
Rapated x process state which we can observe inside the outstar, but 
which is meaningless when observed outside the outstar. Inside the 
outstar the inhibited state is meaningful and definitely corresponds 
to something other than ambient. Thus we have assigned a separate 
state to deseribe the state of an x process which is being actively 
driven to negative amplitudes by signals from outside the Poceh 

There is some difficulty in saying when an x process is in the 
inhibited state in an outstar with thresholds. An x process can be 
actively driven subthreshold by inhibitory processes and still have © 
&@ non negative amplitude. For simplicity this situation will be 
considered to be ambient. The inhibitory state is therefore only 


the state in which an x process has a negative amplitude. In case of 
{55 





a negative amplitude, there is no confusion about the x process at a 
node being actively ore toward negative values by signals fron 
outside the node, 

In summary, the states of ean x process at a node are: 

(1) The excited state, X, =+1, The amplitude of the x process is 
large or suprathreshold, 


O. The amplitude of the x process 


Hf 


(2) The ambient state, %, 
is small, zero, or subthreshold. 

(3) The inhibited state, x, = -1, The amplitude of the x process 
is negative. 
A prediction signal at an arrowhead is the originating node's method 
of influencing the other nodes in the network. In order to define our 
logic AG, %,)s we Pal have to assign states to see iotacn signals 
at an arrowhead. We could assign the Sencianes to prediction signals as 
we have assigned to x processes, This would mean that the prediction 
signal is conveying the state of its originating node to the arrowhead, 
However, prediction Seats do more than convey the state of the origin- 
ating node to the arrowheads, They also influence the state of the 
x process at the node upon which the arrowhead impinges, There is no 
difficulty in allowing a prediction signel to have a large or supra- 
threshold amplitude and deseribine this state as the excited state with . 
state value x, = +l, However, the other states we may allow a prediction 
Signal to be in require some discussion, 

First, consider the case of a prediction signal coming from a 
node with a threshold on ite In the past we have used both "real" 
thresholds and "imaginary" thresholds, The imaginary thresholds were 


placed on a node for precision in interpreting when the node was 
56 . 





responding. The ‘real’ thresholds were placed on a node to prevent the 
Zz processes from learning 5 eee associations when the x process | 
was of small amplitude. In the case of the command node, thresholds 
were used to prevent the comnand prediction signal from causing 
spurious associations from being learned when it was of small amplitude. 
This was accomplished by restricting the command prediction signal 
to be zero until it was suprathreshold, i.e. Cx (¢t -~t) - via 
In this case, we also prevented thr preciceien Signal from influencing 
the state of the grid re upon which it was impinging until .it was 
Eeithreshold. This was accomplished by making the input prediction 
signal to the grid node to be p Zo, t) [x(t ~t)- vt i 
There is a reason behind this. Suppose we have an outstar grid which 
is shared by many command nodes representing separate and distinct 
“contend events. Tn the environment, a distinct pattern of grid events 
usually occurs with each of the conmand events, If the outstar is 
to function properly, it must be able to learn that a certain command 
event, Cys is associated only with the pattern, oun which occurs with . 
it in the environment. It must be prevented from learning that the 
patterns occuring ae the other command nodes in the environnent are 
associated with Cys . 
A subthreshold command node x process only occurs when the command 
event hes not occured recently in the environnent. Thus we can expect 
that -a pattern not corresponding to this command event iis on the grid 
when the command node is subthreshold. By making the nrcuction signal 
coming from a4 subthreshold command x process identically zero, we prevent 


the outstar from building up a wrong association, Additionally, by 


making the prediction signal identically zero, we prevent it from 
1a . 





exciting the grid nodes which are included.in the pattern associated 
with this particular command node. This is important, Consider two 
and V which represent events c. and c_ which occur 
th C2 1 é 
respectively. Suppose 


command nodes we 


in the environment with patterns 7 and Bs 


x4 Ct) is subthreshold and X(t) is suprathreshold. Then we can expect 


that the grid node x processes indicate that the pattern Ce, is on the 
grid, We have already agreed to make the prediction signal Lxo4 (t="t )e= 
+} as ; 

T } identically zero to prevent ee e,, from being learned. Suppose 
however that we allow the prediction input signal fron Vd to the grid 
nodes representing 8, to become excited. The pattern on the grid wovld 
therefore be the algebraic sum 6 + @, The prediction signal 


1 


coming from the suprathreshold node Vio will therefore cause the 


association V=—*" 6, + | e, to be learned.‘ .To prevent this possibil- 


c2 i 


ity we have made the prediction signal input from a subthreshold command 


A we ee ee 


node identically zero. 
; Thus the prediction signal from a subthreshold command node is 
identically zero. We may as well drop the fiction of assuming that a 
prediction signal was sent from the command node in the first place 
and say that a prediction signal is sent out along the directed edges 
only if the x process at the originating node is suprathreshold. 

We also used "real" thresholds interpretively. We now have ae 
case that a subthreshold x process at a node is interpreted as no response. 
Further, it is unable to influence other nodes in the network because 
no prediction signal is sent from this node. Thus a certain amount of 
consistency - added to our interpretation of the amplitudes of the x 


processes, An x process which indicates no response also has no effect 


on the other nodes and processes in the network. If we were unable 
158 3 


‘to measure the amplitude of an x process at its node, we would have 
no way of knowing what amplitude it had as long as it was SybUnrenoiceel 
From the point of view of an external observer or any of the other : 
processes in the outstar, a subthreshold x process is indeed ambient. 
Thus we have an "ambient" state for prediction signals at an 
arrowhead, It is indicated by a zero amplitude and is assigned the 
state value X, = 0, It must be remembered that this state arises from 
an originating node that was subthreshold t time units before. | 
In the case of an outstar without thresholds, we lose the pre- 
cision in defining when a prediction signal is ambient. We will 
therefore describe a small amplitude on a prediction signal to be 
ambient. As previously, "small" will neon small relative to the 
naximum amplitude of a well learned response. 
Having made the prediction signal coming from a subthreshold 
x process identically zero, it would be silly to allow prediction signals 
coming from an inhibited x process to be non zero. In this study we 
will not consider prediction signals of negative amplitude. Part of 
the reason is that allowing an inhibited x process to send out pre- 
diction signals would violate the consistency we have just developed. 
An x process state which is interpreted as no response should not be 
able to influonce the other processes and nodes in the network. 
Another reason is that prediction signals of negative amplitude are 
not required. We have eee nat the negative amplitude of inhibitory 
input prediction signals in lateral inhibition can be accounted for 
by allowing z processes with negative values. In fact, lateral inhibi- 
tion has becn the only case in which we have used inhibition. The 


whole function of lateral inhibition was for an excited grid node x 
159 





process to inhibit the other nodes in the grid. Thus the emission 
of inhibitory prediction signals from a node was only useful when 
that node was in the excited state, | 

In summary, the states of a prediction signal at an arrowhead 
are: | 

(1) The excited state, X, = 41. Tho amplitude of the prediction 
signal at the arrowhead is large and positive. This results from a 
large or suprathreshold x process at the originating node tT time units 
previously. | 

(2) The ambient state, x, = 0. The amplitude of the prediction 
Signal at the arrowhead is small or zero. This results from a small, 
zero, subthreshold, or negative x process at the originating node T time 
units previously. 

We will assign the following states to a z process based upon its 
amplitude: a 
(1) The excitory state, Zos = +1, <A z% process is in this state 


when its amplitude is large and positive. 


(2) The ambient state, Zz. = 0. A z process is in this state when 


on 


its amplitude is small or zero. 


(3) The inhibitory state, Z; = -1, Az process is in this state 
when its amplitude is negative, 

The states for zu processes at an arrowhead were assigned according 
to what effect a prediction signal modificd by the a process would 
have on the node upon which tho arrowhead impinged. Clearly, a z 
process with a large positive amplitude would result in prediction 


excitement of the impinged upon node, A 2 process with a nesative 


anplitude would result in prediction inhibition of the node. A 2 
/60 


process with a small or zero amplitude would result -in very little 
disturbance of the impinged upon node. The ambient state for a gz 
process is also the passive state for a 2 process, With a non zero 
forgetting rate, it is the state to which a z process passively returns, 
and it is the state which a 2% process assumes when it has not been 
perturbed by signals from outside the arrowhead, 

Up to now, the only way a z process could assume the inhibitory 
state z was by permanent assignment of a negative value to the z process. 
In what follows, we will consider new oun) dion s for the equations 
for a z process that will allow a z process to learn to assume the 


inhibitory state. 


/6/ 





section 7.3 Logics 


Having described the states of the various processes in an outstar, 

we are now ready to introduce ea function a (x, ; x, ) a Zaye A 
describes how the state of a z process at an arrownead is determined 
from the states of the prediction signal at the arrowhead and x process 
at the adjacent node. Throughout this discussion tho state of a pre- 
diction signal will be denoted by Xp The PEs of the adjacent x 
process will be denoted by Xs and the state of the z process will be 
denoted by Zea The choice for the subscripts was motivated by the 
geometry of an outstar, but the discussion is not limited to outstars, 
Jt applies to all networks which may be built from embedding field 
elements. Throvghout, the function tier be called a "logic", We 
will introduce several distinct logics and they will be distinguished 
by subscripts, ie. or 

- A logic is a tabular function. That is, we tabulate all the 
possible combinations of prediction Signal states and x processes 
states and assign 2 2 process state to this combination. For example, 
the logic Kos for the excitory biased outstars we dealt with proviously 
is defined by: 


Definition of the Excitory Biased Logic, d, 


X, Xs | J Fe x,) aoe 
0 , 0 0 
+4 0 0 
0 ri) 0 
+4 til +1 
0 o1 0 
+4 4 0 


(The inhibitory prediction signal state x, = -1 has been excluded 


from consideration for reasons of consistency as explained in section 7.2) 
162. 


The reasons for calling this an excitory logic are clear, The 
only states allowed for the z process are the ambient state 2,, = 0, 


and the excitory state Z = +1, The ambient state is passive. The 
% process does nov actively learn to be in the ambient state, Therefore 
the only state which the z process can actively learn is the excitory 
stato, Thus the z process is biased to. learn only the excitory state, 

The oxcitory biased logic, ne is implemented in an outstar by 
the equation: 
734 2 6 (t) = uz (t) + v [x(t -7)-Th )* Oy) - 7" 
where either or both thresholds can be zero, 

The driving functions in equation 7.3.1 is: 

v{x Ct -%) = ibe 1 (x, (t) Ss sli 1" 
This function is always non negative. It can actively drive the z 
process only when the prediction Bene) and the adjacent x process are 
both in the excited state, Additionally, because the driving function 
is always non negative, it can only drive the z process in the direction 
of increasing positive amplitudes, Thus our tabular definition of an 
conveniently swmarizes the effects of equation 7.3.1 on the outstar, 

Note that 4) only deseribes the imuediate effect of the states 
of the prediction signal and the adjacent x process on the 2 process, 
It does not describe the current state of a z process based on the 
entire past history of the prediction signal.and x process states, 
That is, ao only tell us in which direction the 2 process will be 
driven by the signals at a given time. | 

We shall now consider other logics for z processes. - A general 
approach would be to consider all the possible assigiments of Ae 


A, 


states to each of the six distinct combinations of x, and X, states. 
163 


However, this results in 36 logics. We will therefore have to use some 
judgenent in selecting the logics to be considered, 

A key tenot of enbedding field theory is that an excited predic- 
tion signal and an excited x process should result in an excitory z 
process, Thus we will only consider logics in which: 

AG, = ee ee 

Also, we have always started experiments which the z processes in 
the ambient state. That is, the initial conditions on the z processes 
have always been small or zero. We have interpreted those initial 
conditions as a state of initial ignorance, It would be senseless to 
allow a learning machine to develop from initial ignorance to learning 
something by itself. For this reason we will only consider logics 
in which: 

A (%, = 0, %, = 0) 

This reduces the possible logics to 3 = 81, There are no over= 
riding reasons for excluding broad categories of the remaining logics. 
However, 81 logics is just too many to consider. We will only consider 
those which show promise in this study. These logics are defined in the 


table below: 


Dawes 7.3.4 


P—> 
—) 
SS 
}—> 


— 


L, As 


Xo aA 64 oa Cot Ci 
0 0 0 0 0 6) 
ae 0 0 0 ={ -j 

0 +41 0 0 -j 0 
+1 +4 +41 IL til el 

0 =i 0 0 ={ 0) 
cal -1 0 4 -1 ={ 


HOG 





Ko is the exci tory biased logic we have considered previously. 
it 1 is the logic resulting from removing the non negative restriction 
on the driving function jin equation 7.3.1: 
fe. 2 a, 6t) =: ~0z,.(t) vx (t Se At) 

As the tabulation of ds shows, if x(t) is negative, z3(t) will learm 
inhibition, dy can be considered a neutrally biased logic because the 
% process is not biased in favor of excitation or inhibition. 

A , is interesting, but of dubious value. Suppose that all the 
% processes in a notwork are in tho ambient state at the beginning of 
an experiment. That is, the network is in 4 state of initial ignorance 
at the beginning of an experiment. Then a 2 process in this network 
can not possibly assume the inhibitory state, The reason is that 
the only states for input pulses are P =+4 and P = Gael he anput 
pulses can only drive x processes in the network can assume are X, = cH] 
and x = 0 due to input pulses. Therefore the prediction signals in the 
net work can only assume states eee = +41 and x, = 0, The combination 
of states x, ie x = -1 can not occur. By the tabulation of £ 49 
the state z,, = -1 can not be attained. 

Thus the logic Ll 4 is effectively equal to the logic d oe if we 
allowed the permanent assignment of negative values: to 2 processes in 
a network governed by or then it is possible for the learning z processes 
ian the network to learn inhibition. However, this requires the arti- 
ficiality of a 2 process with a permanently assigned value. 

A defined in table 7.3.1 is particularly interesting in an 
outstar, As can be seen from the tabulation, z processes in a network 


governed by A. can learn inhibition from a state of initial ignorance 


9 
without the use of 2 processes with permanently assigned negative 
165 


values, The two assignments: 


rH 


0) = -1 
zl 


ne = 41, X, 

NAC, esis ace) il 
insure this. In an outstar, these assignments mean that a command 
node can learn to inhibit grid nodes which do not correspond to events 
in the pattern associated with the command event. Consider a comand 
event c which usually occurs with the pattern 6 ‘in the environment, 
Let the grid events {a} be the events which compose this pattern, 
Let the grid events ee be the eens events represented by 
grid nodes. Then the assignment: 

A si, %, = 41) = 41 
means that the z processes Zq4 Ct) associated with the grid nodes in- 
cluded in the pattern will learn excitation. The assignment: 

{= 44, %, = 0) = 1 
means that the 2 processes Ze jy) associated with the grid nodes not 
in the pattern will learn inhibition, Further, the assignment : 

AG, =, Ks) = 1 
insures that once these z processes have learned inhibition, they will 
continue to do so. The result is that after having learned the pattern, 
presentation of the command event alone will resvlt in the grid nodes 
ancluded in the pattern being excited. The grid nodes not included 
in the pattern will be inhibited, If a random mistake occurs in the 
pattern, the learned inhibition will cause it to be supressed. 

We will consider an outstar governed by t, in detail in the next 


chapter, The rest of this chapter will be devoted to an outstar 


governed by & 2 


166 





section 7./+ Formulation of the z Process Conforming to Logic dh. 


The logic £ is defined by the tabulation: 


Mamie 7et.t 


Xe Xs iG x.) az 
0 0 0 
+4 0 ~4 
0 4 -j 
+4 +4 +4 
0 =i ~j 
ua =4 -4 


In this section we shall develop a formulation for a 2% process 
that will conform to this tabulation. However, we might inquire 
beforehand if this is 5 worthwhile endeavor. The large number of 
inhibiting assignments makes ff 5 appear somewhat useless, In the 
discussion of £, in the previous section we saw that the following 


3 


assignments in table 7.4.1 are usef 


" xo xy | { (xo x, ) 5 2s 
0 0 0 
a1 0 : =i 
ad ot tn 
+4 -1 -4 


We only have to establish the possible usefulness of the other two 


assignments: 
7a LOG, = Q, ae == ‘4 ) = ae 
72 f o(%, ae x, ae 


Assignment 7.4.1 above says that a 2 process will learm inhibition 
if a grid node is excited and the prediction signal is not. This 
combination can only occur if a pattern not corresponding to the command 
event is on tho grid. Thus learning to inhibit Cie pattem by pre- 
diction when the command event is presented is useful. Assignment 7.44.2, 


167 





however, can get us into trouble. Suppose there are two command nodes 


V 1 and V_ sharing the same grid. Let the command events Cy and | 
iS c2 

Cy represented by these nodes usually occur with the distinct patterns 
e 


4 and © 2 respectively. Let the event represented by grid node 


qy be an event which is included in pattern oe but not included in 


pattern @ 4. Then we can expect that excitement of V- will result 
Cc 


in the x processes of the grid nodes asswning the values describing 
e 9° Additionally, because of assignment Vetel, node Jy will be 
inhibited. Therefore assignment 7.4.2 will result in 2 e434 °°? learning 


inhibitions, If this is learned sufficiently well, subsequent excitement 


of V., will result in grid node i, being inhibited even though it is 
part of the pattern e 4 associated with Cye 


This vividly illustrates some of the problems we can get into with 
logic ap) It is not the only one. If it happens that the comand 
node or the grid noces in an ovistar are randomly excited for some time 
then t, will cause all the z processes in the outstar to learn innibi- 
tion, When we get around to teaching the outstar the pattern associated 
with the comand Brent, we will have to overcome this initial inhibitory 
biasing. Ina real environment, this will probably be the case. Our 
outstar will be "born" with all of its 2 processes in the ambient state. 
It will then spend a period of time ie the environment before "going to 
school", In this period the random pecans of the command event 
and grid events is highly unlikely. Therefore, when the outstar "goes 
to school" all of its z processes will probably be inhibitory biased, 

In ordor to -—prevent this inhibitory biasing from destroying the 
ovtstar’s ability to learn when it goes to school, we will limit 


ite effect, That is, we wil] limit the maximum negative amplitude 
/68 


of a z process to a value that will insure that positive associations 
can not be completely inhibited. This rather vague statement will 
become clearer as we progress in the ee an outstar governed 
by Le 

A formulation for the 2 processes in an outstar that conforms 
to dn is: | | 
ea z(t) ect ea (a(x, (t ~t) + x,(t))* ~ b(x (t -T) - 

x,(t))?) 

with b > a:a> 0 

Expanding the right hand side of equation 7.4.1 we get: 
Pele2 8, (t) = mae _(t) tv ( -(b ~ ax Z(t - 7) ~ (b - a)x,2(t) + 

2(a + d)x(t -t) x,(t)) 

wire > aya > 0 
From equation 7.44.2 it can be seen that this formulation conforms to by. 

It is interesting exactly how this formulation came about. In- 
the progress of the experimental study for this thesis report, the 
author began thinking of simulating an outstar on an analogue computer, 
At that time the idea of logics had not been thought of. The author 
was interested only in simulating an excitory biased outstar on an 
analogue computer, To do this the 2 process driving function: 

vx (t - t )x,(t) 
had to be simulated. The product of two varying signals is implemented 
on an analogue computer eee of square aw devices. For example, 
the product xy is implenented by forming the sums: 

mr ¥ and x = ¥. 
These sums are then scaled by constant factoxvs a and b. Each sum is 


sent through a sep2rate square law device and then the difference is 
169 





formed: 

ate + y)> 2 bGe = ye 
expanded, this is: 

(a - b)x” + (a - b)y* + 2(a + b)xy 
Thus, by selecting the scaled factors a and b such that: 

a=b 
the result of this process is: 

2(a + b)xy 
Scaling this by 1/(2(a +b)) results in the desired product. 

It was recognized that an outstar so simulated with a # b would 
have some of the desirable properties of f,, A digital simulation 
of an outstar with the formulation 7.4.1 for the z processes was run, 
The results were confusing and in an attempt to clearly define the 
properties of this outstar the idea of logics and processes’ states 
was conceived. Having developed this concept, it was realized that it 
a a handy description of the possibilities for formulating other z 
processes, Additionally, it was a convenient method of predicting what 
an outstar with various z process formulations would do, 

The z process formulation given in equations 7.4.1 or 7.44.2 has 
some interesting properties other than those described by the tabulation 

of ci The z process driving function in equation alist ase 

D(t) = v (a(x (t -%) + ae - bls (t -T) - x,(t))*) 
This is composed of two competing processes, The process driving the 
2 process in the direction of an excited state is a(x (t - 1) + a 
Competing with it is the process “b(x (t -%) - ee) which drives 


the 2 process in the direction of an inhibited state, 


170 


Of particular concern to us is the point where these competing 
driving functions exactly balance one another. This point is achieved | 
when: 

Z 2 
a(x (t = 1) + x.(t))” = beat) 7) 
c al Cc al 
Let p be the ratio of the amplitude of a prediction signal at an arrow- 


head and the adjacent node x process, 1.6.: 


7 x, (%) 


x(t aaa) 
then: 


2 
ea ok 
=) = b/a 


BS, - SS 


Bri? 
ey ow 
Beil 


since b/a > i > 0: 
wot 
Keo i 


Using the positive square root, we get: 


Vola +t og 


We > 
i Fofa' = 4 V0 es 


Using the negative square root, we get the inverse: 


Vb/a ~ 1 


b/a +1 


=+ Vb/a which is a real value, 


= =o 


Note that: 


0< BS <4 


Tt 





This calevlation shows us that there are two ratios, Bh : and Mo 
where the competing driving functions are eee Note that Mo = 
1M y which is as it should be from the definition of ¢ » Fora 
ratio between the prediction signal and the x process Se 
falls in the range: | 

a . + 

yao <0 | 
the total driving function D(t) is positive. Thus the z process is 
being driven in the excitory direction.. Note that the bounds [9 and 
Bo of this region are both positive, Since we do not allow negative 
prediction signals, this means that D(t) is positive only when both 
a -) and x(t) are positive in conformity with A. Outside 
the region JE a < BR < Py D(t) is negative and the 2 process is 
being driven in the inhibitory direction. 

The ratios B a and its peor cca on are called the cross 
over ratios for obvious reasons. By specifying a and b to result in” 
a particuler cross over ratio, we can specify a sort of "floating" 
threshold on the z process. The thresholds we have considered previ- 
ously have all been "fixed". That is, the amplitude of the process 
they were thresholding was compared to their fixed value. If it was 
greater than this fixed value we got a different result than when it 
was less, The floating threshold in the z process under consideration 
is a function of the ratio of the amplitudes. of the prediction signals 
and the x process, If this ratio falls in a one range we get one 
result and if the ratio falls outsido this range we get another. The 
range is completely determinod by the constants a and b, - 

One further analytic property of D{t) is that it is a convex 


function of the ratio ye » it therefore has a maximum with respect 
7 172 





to ia which we corrpute: 
eD(t) 
op 


or the maximum of D(t) with respect to js occurs at: 





== “2 p(b - a) - 2(a +b) = 0 


eat) 
max = 
ye ae b 





note that ho < pp max < ye 
This says that the maximum "force" Gig ine & Z process in the 
excitory direction occurs when the prediction signal and the x process 
are in the ratio, pomaes to one another, There is no mininun to 
Dit). Thus the driving function D(t) seems to be biased in favor of 
driving the z process in the inhibitory direction, To compensate 
for this and to cover the initial inhibitory biasing of this 2 process, 
we will artificially bound D(t) on the negative side. That is, we will 
use a driving function D(t) defined by: = 
D'(t) = 8, (H, + 245(t)) D(t) 
where No > 0 | 


and where: 


O iene 
By the proper selection of Moe Z44(t) will be prevented from asswning 
large negative values that would totally inhibit the learning of excitory 


associations, 


173 





section 7.5 Specification of the Parameters in an Outstar Conforming 


to Logic A 2 


e ‘ Jj. 2 a qe , “ Ss “if e 
By incorporating the equation for a 2 process developed in the 
previous section, we get the equations governing an outstar conforming 


won tocic d. ot 


7e5el x(t) “ ax (+) + P(t) 


We. 2 x, (t) 
765.3 Bag(t) = -uzg(t) + v6 4M, + zgg(t)) ( alxg(t -%) + x, (t))? - 
dix (t -t) - x,(t))*) 


- With this formvwlation, the 2 processes in the arrowheads of the directed 


waxg(t) + Pat) + Pz (txt - 2) 


edges from the command node can learn inhibition. If they do, then the 
excitement of the command node will eat in direct inhibition of the 
grid nodes. For this reason, the outstar governed by equations 7.5 
will be called a directly inhibiting outstar, We will run the same 
experiment that we have used on other outstars, Therefore the ae coneers 
of the directly inhibiting outstar are specified to be the same as in 
the other outstars except where there are special considerations to 
be made: 
Input parameters: 

The input shape is rectangular 

A = 10 

§ == 0.3 seconds 
Network parameters: 

O = 3,3333 seconds” 


prt 


v 


q 


0.3 seconds 


Lier 





Network parameters continued: 
N= 3 
Initial condition on all variables is zero. 
The presentation rate for presentations and/or predictions will 
be 1.8 seconds. wu will be specified such that the decay time 1/u 
for the z processes will be twice the presentation rate: 
mes 0.2708 sec. = 1/((2)(4,.8 secu 
To specify a and b we must select the cross over ratio Je a = i/o 
rE x,(t) 
Oy 4 
x (t -7%) 
Cc 


is the ratio between those functions at which the 


competing driving functions in the 2 process balance, A cross over ratio 


of Ko 


b/a 


11.5 was selected arbitrarily. Thus: 
(( a) 1)) 2 = 1h | 


Arbitrarily, b was sclected to be b =i, 


i 


Therefore a = 0,707. a: 

With these parameters, v was experimentally determined on the two 
presentations mean well learning criteria. The value of v so determined 
Was: 

v = 0.25 

- the lower bound on negative excursions of the z processes 
requires some thought. M Should be specified luch that an amplitude 
of z(t) = -M will not prevent learning of excitory associations, 
Consider equation 7.5.2 for the x processes when 2 6t) en 

x, (t) == ax, (t) + PM) eas tae ne) 
if the node V5 is being excited by an input pulse we want the combination 


of inputs to Veo 


(ae 





P(t) - BM x (t -%) 
a 2 C 
to be sufficiently positive to drive x(t) to values such that: 


x(t - 7) 2 ca (il ata) 





x, CL) > a? 
0 . 
If this condition is met, then the driving function for 2,4(t) will 
be positive and Z(t) will move away from the value z,.(t) = -M, 
in the excitory direction. In such a situation, the outstar will always 
be able to Nese that the command node is excitorally associated with 
a grid event by sufficiently many presentations. 
Analytically, the maximum amplitude for x(t -T) is (A/a )(1 - e7?), 
If we make P(t) = M (Aso (4 - e71), then we covld expect the in- 
hibitory input Mx, (t -+v) and the excitory input P,(t) to approx- 
imately cancel. In this case, px, = /( - evi) = Gen subse) 
fi =1, we therefore want M. < 5.28 at least. To allow room for errors, 
MN. = 2.10 was selected. . : 
To investigate the effect of random occurances of the command 


event in inhibitorally biasing the outstar before it "goes to school", 


the command node alone was excited once before presentation of the pattern. 


176 





section 7.6 Experiments with a Directly Inhibiting Outstar 


Figure 7.6.4 shows the results of performing an experiment with 
the directly inhibiting outstar specified in the last section. Note 
that excitement of the command node alone at the beginning of the 
experiment results in small negative amplitudes for the z processes, 
The directly inhibiting outstar is thus slightly inhibitorally biased 


before “going to school". "School" begins with the second presentation 
of the comuend event. From the xp (t) trace it can be seen that the 
pattern Vi Vy was approximately well learned in two presentations. 
Event 1 is not presented. The Z a4 (t) trace shows that the outstar 
has learned to directly inhibit grid node V,. 
Event 2 was presented with Q = 0 presentation phase with respect 

to the arrival of the prediction signal. (Presentation phase has been 
explained in section 3.5.) Event 3 was presented with presentation 
phase P = 0.6 seconds after event 2, As can be seon from the x(t) 
and Z93(t) traces, the outstar has learned to inhibit grid node Ve 

- The experiment was continued to test the resistance of the directly 
inhibiting outstar to random mistakes in the pattern. Figure 7.6.2 
shows the results, Event 1 is the simulated random mistake, As can 
be socen from the x, (t) and Zi 44 (t) traces, the direct inhibition the 
outstar learned before the secu of this mistake resulted in little 
damage to the pattern. Z 04 (t) rose to a small positive amplitude 
which is decaying. The prediction following occurance of the mistake 
did not cause Zz 4 (t) to increasee Thus we may conclude that the outstar 
will forget the mistake entirely in time. 


The exporiment was continued to test the correctability of the 


(71 








“18 





saiates 


lO 





sere WN 
< >< = 
oO 
B z= N 
Q_ 
| | 
© O lO 


“Lp oTZoT Aq peuseaod 
vu 


IejsyNo Us UPEM peyonpuod JueUTAedxe us JO SzPTNSSY 


“T°9°L =eanstTy 


Z 
Z 


oe One ou 


Z att) 
5 
TIME (secs) 


C 


Pn ee a ee M 
a ga ee ere eee ema eects M 


O 


9S 


178 





=1S'- 


IO 


N= 


XH 


P (i) 





*ayeystw ouy ST T qQueaq Cees UleyzZed peutveT ATsnotTAsud 34 


UL oxXeYSTu wOpURT peJeTNUTS & JUTJUSSeTU Jo 4[NSeYy “°7°g’s sindtTy 


A 
© 
om oO 


t) 


= 
| 
| 
| 
| 
| 
| 
| 


P(t) 
C X A(t) 
P(t) 
X a(t) 
Z(t) 
O = AS al 


HO ie= 
lO — 
5 


Fein rg Seeeg PZ 


2 Age 


Z.3(t) 
ae 7 


0. 


4 


5 


2 


TIME (secs) 


iS 





A=1IQ0 


GC. 9 Pie ae , 
an a ureyyed Supyoets0o ey} YITM Aw A US zed poutesT 


ATsnotasrd ayy 4yoedtod OF Sutqiuezze Jo F[Aset su], 





mC  Oea oindsT J 


TIME (sec) 


| 
Y Oo 


<-— 


—_ 

nad 
© 

NJ 


a 
| 
| 
| | 
| 
| 


“— 


ee” 


ON 
oO 


oO 


Taw CEES Cae 
. 


\ 
! 


Z, ait 


as, 


0 


aaa ace "12 


180 





directly inhibiting outstar. pipes 7.6.3 shows the results, An attempt 
was make to correct the previously learned pattern Ve me with the 
correcting pattern las by presenting =e twice. Figure 7.6.3 
shows that the attempt was unsuccessful, The first presentation 
of aro was treated like a random mistake. The previously learned 
inhibition of Ne was sufficient to prevent 2_3(t) from rising to much 
of a positive amplitude, The next presentation of Nese Me dia result 
in a healthy increase in Zy3\t). Further presentations of Ve 
will result in it being learned better. However, eae was not 
"unlearned" during this time. Both excitements of the command node 
resulted in approximately well learned responses by x,(t). 

The only method by which this directly inhibiting outstar can 
correct a pattern is to forget the old pattern while learning the new 
pattern, From the Zot) trace we can see that the presentation rate 


ror a was just right to result in “pwaping up" Zo(t) such 


3 
that VV, remained well learned during the correction attempt. 

' Thus the outstar could not forget Ue while learning ae The 
addition of lateral inhibition and/or increasing the forgetting rate 
u would probably increase the correctability, but these options were 
not investigated, 

In the discussion of the logic & 9 in section 74, it was noted 
that random excitation of the grid nodes without excitation of the 
command node might result in inhibition of a learned pattern. The 
assignment: 

(= Oe ee ed 
2 ie a 
is the source of this possible trouble, It was decided to see if this 


‘was jndecd a problem. All the grid nodes were excited twice without 
161 





18 





Sel Tl 


oO ; 
“(fn pyey ULG4JRU PGUXLYT ATSnotAosud 3y4 uo 


auoTY sapou plus Jo JUuaweyLoXxs Jo 4Y[Nsey “7°O°L sandt gy 


Z 


M 


TIME (sec) Z(t) 


enV 


-- Z eolt) 


Sn 
O 
S 
0 





Mz 


Z.3(t) 


: rp 


teen 


D 
ee 


182 


"2 





exciting the comnand node. The command node was then excited to see 
what would be predicted on the grid. Note that because of the un- 
successful correction attempt, the outstar had learned Vamos V. 

at the end of figure 7.6.3. 

Figure 7.6.44 shows ae result. Z oht) and Z(t) were very 
slightly driven in the direction of inhibition by the grid node 
excitements. However, as the p:iediction shows, the pattern Valo; V3) 
is still in the outstar's memory. It can still be completely recovered 
by “pumping up". 

This result does not mean that there is no problem with random 
excitements of the grid nodes in an outstar conforming to el 


Z 


only moans that it is not a significant problem in the outstar under 


study. 


183 





section 7.7 Generality of the Formulation of the z Process 


Conforming to Logic d, 


The z process formulation conforming to logic 4, that we have 


used is: 
PePet agg (t) = -me s(t) + v By, + 2o5(t)) (ale (t -7) + 
x,(t)) = bla(t 1) = x,(t))?) 
where: 
i scheeye > 0 
Bea y) = 
O it, © 


As was shown in section 7.5, setting a = b in equation 7.7.1 
will result in: 
7 Jae? Z44(t) = nuz,,(t) toe - z4(t))evea cf b)x (tt ~ ¢ )x,(t) 
Equation 7.7.2 describes a z process conforming to the neurtrally 
‘biased logic f , of table 7.3.1. By setting M,= 0 in equation 7.7.2, 
we get the excitory logic oe of table e304 which has been tne logic 
we have used in the simple and laterally inhibiting outstars, Thus 
the z process formulated by equation 7.7.1 is rather genreal. By 
specifying the parameters a, b, and uM. we have a choice of which logic 
and what type of outstar we shall get. 
The general application of equation 7.7.1 does not end there. 
By appropriate specification of the parameters a, b, Mos and v we can 
make a z process governed by it “practically inhibitorally biased". 
Suppose, for example, that we wished to make a laterally inhibiting 
outstar. We connect all of the grid nodes with directed edges and 
arrowheads. Previously we have used z processes with a permanently 
assigned negative value to get laterally inhibiting prediction signals. 


However, we can now make all the z2 processes in the network conform 


184 





to equation 7.7.1. By proper selection of a, b, and Vv we can make the 
2 processes an the laterally inhibiting arrowheads negative most of 
Afetime. | 

To do so, we depend on the statistics of the environment. It is 
unlikely that ant two x processes in the grid will be oxcited to 


identically equal amplitudes for very many times in succession. 


i? 


Therefore, by specifying the cross over factors f\ = tf peo iL 
we can be almost certain that the 2 processes in the arrowhoads will 
learn inhibition. 

An experiment was conducted to test this conclusion. Two nodes, 
Y, and - were connected by a directed edge as shown at the bottom of 
figure 7.7.1. The originating node, Vy was excited four times in 
succession by input pulses, The “receiving" node, Vos was excited 
twice exactly when the prediction signal arrived at the arrowhead, 
The parameters used in the exporiment weres 
Input parameters: 

Input pulse shape is rectangular 

A = 10 

& = 0.3 seconds 
Network parameters: 

All initial conditions were zero 


A= 3,3333 eee 


gi 
4 


u = 0.278 seconds 
vT = 0.3 seconds 


1,0 


< 
iN 


a = 0,12 
185 








ag le 





Figure 7.7el. Demonstration of a z process which loarns inhibition. 


Network parameters continued: 

b = 1,u 

M = 2.10 

From the selection of a and b, the cross over ratio Po = 17K 
was conputed to be: | | 

fo = 2 

Perare 7e7<i shows the result. The initial excitement of Vy alone 
resulted in the z,5(t) process being driven to its negative limit, 

“Moe The two presentations of event 2 exactly at the time that the 
prediction signal x, (t -~%) arrived at the arrowhead resulted in 

Z4 o(t) being driven to a positive amplitude. However, the fourth 
excitement of Vy resulted in sot) returning to inhibitory values. 

Thus we may conclude that the 24 (t) process will behave as an inhibitory 
“process most of the time. Note also that we did not have to specify 

the cross over factor to be exactly 1 to get this result. 

Of course, specifying a = 0 in sateen 7efeL WOUld make the z 
process always inhibitorally biased. The above experiment was conducted 
to show that we did not have to go to this extreme to get the desired 
results, 

If we go to the other extreme and specify b = 0 in equation 7.7.1, 
we get: 

76703 Z(t) = muro g(t) + vad _ (ML + 2o5(t) a(t -%) + x, (t))? 

This formulation will result in the z process being driven to 
positive amplitudes when ever x(t ~%), or x(t), or both, are non 
zero. Thus we can replace the permanently assigned positive z processes 
in the cormand node cascade in an avalanche with “learning" z processes 


that are governed by the same general equation as all the other 2 
187 








processes in the avalanche. 

| The Z process Remmeleteen given by equation 7.7.1 is therefore 
general enough to be used in all the applications we have found for 
Z Pecosses jn outstars and avalanches. We could specify that all 
the z processes in a network be governed by this formulation, The 
special features of the network such as a command cascade or lateral 
inhibition can be implemented by appropriate selections for the 
parameters a, b, and M,. Thus the design of an outstar or an avalanche 
could be reduced to specification of these parameters at each of the 


arrowheads in the network. 


188 





CHAPTER 8 THE CHEMICAL OUTSTAR 


section 8.1 Introduction 


At this point there are three outstanding promises made in the 
previous WiaateEs. in the introduction to chapter’ one, it was promised 
that this thesis would examine Grossberg's theoretical proposal for 
the neurophysiological processes that allow a living organism to 
learn, In chapter five it was promised that a solution to "pulse 
lengthening" in a cascade of nodes would be developed. In chapter 
seven it was promised that an examination of a logic corresponding to 
logic A in table 7.4.1 would be made. 

We shall keep these promises in this chapter. A synthesis of 


all three will be developed and we shall examine its performance, 


184 





section 8.2 The Analogy Between Embedding Field Networks and the 


Nervous System of Living Organisms 


Fipure 8.2.1 shows the antlogy between ance field network 
elements and the elements of the nervous system of a living organism. 

A thorough perusal of figure 8.2.1 wovld explain this analogueto the 
reader better than volumes of words, 

For the uninitiated, a brief description of the nevrophysiological 
elements and Grockeses shown in figure 8.2.1 is offered, The dark 
cell body ane axon shown is an interneuron in the spinal column of 
a vertebrate, The light cell body and axon is a motoneuron. Neurons 
are living cells. They occur in organisms in a variety of shapes, 
However, thoy always consist of a reasonably elongated part called 
an axon, and a "fatter" part called the cell body. The cell body 
contains the cell's nucleus. An interneuron and a motoneuron were 
chosen for figure 8.2.1 because they have been extensively studied and 
the information shown was easy to collect. 

The traces shown are voltages recorded by microelectrodes inserted 
into the interneuron and the motoneuron at the places shown, These 
recordings correspond to the following sequences of events: The 
interneuron is excited by an electrical signal delivered to the cell 
body by a microelectrode. This signal results in tne membrane potential 
of the cell body rising from its resting potential of approximately 
-70 mV. There are two parts to this positive increase in the cell 
body membrane potential: The excitory post synaptic potential, EPSP, 
and the action potential (spike). The EPSP is the lower trace which 


is shown as a solid line. If the EPSP does not rise to suprathreshold 


140 





“opUsqezAeA GE JO uuMpoO TeuTds vif, UT uodnsuozow e uo Juysdvuds umoys 
SF UOUNILdeyLT Uy PoyEUquzAcA C Jo wejsAs sNoATOU SYY JO SlisUleTo au4 
pue ALOSYZ LTETJS JUpppeque Jo szueusTs ayy Jo uostuedwog *T*z°g einst J 


cl Ol 8 -97 DBD cme © 
OL-\ 


a Ol 3 OU? =e GO 











GIOHS3YHitS \ . 
Aw \ a 
\ = 
@ 
Zz 
mM 
G& 
JES o%6 p 
0 00006 2 
ieee 
Ol. au NRG) eames 
\ WHSLVTIOOF-Y | av3H a -(2-1))x] 
‘aeysiv 0g, ya | MOMMY F " 
TVNOIS NOILOIGSYd 
ISGVNAS / ~ 
7 JO NOILVOISIND VIN 7 = a 
(Ayids) S a31934uI0 : es, 
- mM 
WWILN3LOd NOILOV Oo 5 5 
(sedan QIOHS3YHL > 
oy 


VILN3LOd ge Mou 
OILdVNASLSOd AYOLIOXS 0 NG 


(3¥IdS) 1VILN3LOd vonov {| noe Tz0 





NOILVLIOXS 
WoOlWLd313 (Hd °3S1Nd LAdNi LN3A3 


(9 








values, then it is the only signal recorded at the cell body. Further 
a subthreshold EPSP does not result in an action potential (spike) 
being propagated down the ason, 

When the EPSP rises to suprathreshold values, a spike is propa- 
gated down the axon. In addition, the spike is "reflected" back into 
the cell body giving rise to the dotted line spike trace shown super- 
imposed on the EPSP, 

The spike is formed at the nae where the cell body narrows 
down to form the axon. It propagates down the axon at a finite cele 
city which is on the order of 5 meters/sec. to 100 meters/sec, The 
type of neuron and the covering on the axon determines the propagation 
velocity. Ina particular type of neuron, the propagation velocity 
is fixed. All spikes are transmitted at the same velocity. Spikes 
also always have the same amplitude and shape, 

The end of an axon generally breaks up into a number of collaterals, 
Bach collateral ends in a swelled portion called a bouton, These 
boutons are located immediately adjacent to another nevron's cell 
body. The bouton-cell body junction is called a synapse. For this 
reason the geometric arrangement of the neurons shown is described 
as an interneuron “synapsing" on a motoneuron. We have shown the spike 
propagated down the axon as it arrives at the synapse. Note that it 
is delayed due to the finite transmission velovity. 

A spike arriving at a synapse causes the adjacont cell body 
membrane potential to rise from its resting potential with an EPSP, 

If the EPSP rises to suprathreshold values, a spike is propagated 


down this neuron's axon, 


fae 








There is a short delay between the arrival of a spike at the 
synapse and tne beginning of an EPSP at tue adjacent cell body, This 
is because the cell body being synapsed upon is not excited electrically 
by the spike. Instead, the spike causes uitincadaeetes a chemical 
substance in the space between the bouton and adjacent cell body. 

This chemical substance is called transmitter. It causes the EPSP 
in the synapsed upon cell body by changing the cell body's pemicability 
to different ionic species. 

A magnification of a synapse is shown, The space between the bouton 
and the cell body is called the synaptic cleft. Under an electron 
microscope, the synaptic cleft is revealed to hold a number of small 
particals called vesicles, It is currently believed that these vesicles 
are packages of transmitter which burst open when a spike arrives at 
the synapse. 

The reason for these voltage traces is relatively easy to understand. 
A neuron is surrounded by an pve rstaied fulid in which various ions 
are dissolved. The interior of a neuron is also a fluid like substance 
in which ions are soluble. The boundary between the interior of the 
neuron and the interstitial fulid is a membrane which is selectively 
permeable to ions. In a nevron at rest, the membrane is permeable 
to potassium ions, Kt, but reasonably impermeable to sodium ions, Nat, 
There is additionally a “sodium pump" in the membrane which continu- 
ously ejects Nat ions from the neuron's interior. To maintain electrical 
and chemical equilibrium of the overall system, there is a higher 
concentration of Kt inside the neuron than ovtside. The reverse is 
true for Nat, The result is that the interior of the neuron is approxi- 


mately 70 milli volts negative with respect to the interstitial fluid. 
193 








Electrical stimulation of the membrane results in a sudden change 
in the mombrane permeability, The membrane becomes permeable to 
Nat ions and they diffuse into the neuron. This results in a sudden 
increase in the voltage of the neuron's interior with respect to the 
interstitial fluid. Ina ee short time the Memamane regains its 
impermeability to Nat ions. Kt ions then diffuse out of the neuron — 
to redress the equilibrium and tiie potential across the membrane drops 
to the resting potential. The net effect is a small loss of K+ ions 
and a small increase of Nat ions inside the neuron. The sodium pump 
will redress this in short time. Thus with microelectrodes inserted 
into the neuron the potential across the membrane can be measured 
and electrical traces similar to those shown can be recorded, 

Release of the trandmitter substance in the synaptic cleft by a 
spike causes similar membrane perneability changes which result ‘in 
an EPSP, 

Next to the neurons we have shown oe geometrical elements and 
processes which occur in embedding field elements. Grossberg has pro- 


posed the following analogy between the neurophysiological phenomena 


in an organism and embedding field theory: 


g 


Embedding Field Theory Living Organ 


| 





Geometric elements: 


node cell body 
directed edge axon 
arrowhead synapse 

Processes: 
x process cell body membrane potential 
prediction signal action potential (spike) 
emplitude of z process amount of transmitter 


substance availeble 
in synaptic cleft 


[ae 





Except for the last correspondence, figure 8.2.1 shows that the 
analogy is in general very good. There are differences in dctail which 
we will take the time to explain here, 

The x processes see are nov divided into an EPSP and a super- 
imposed spike. Further, the maximum amplitude of the prediction signal 
is directly proportional to the amplitude of the x process which, in 
turn, is directly proportional to the amplitude of the input pulse . 
The amplitude of a spike on an axon is constant and independent of the 
amplitude of the signal exciting the cell body. | 

peccee, the situation we have show on the interneuron is the 
response to a single excitation of short duration and limited amplitude. 
in the usual caso the EPSP is suprathreshold for a reasonably long 
time. This results in a barrage of spikes being propagated down the 
axon. The frequency of these spikes is proportional to the strength 
of the stinulus exciting the cell body. In Grossberg's proposal, 
the amplitude of the portion of the x process that is suprathreshold 
is considered to be proportional to the spiking frequency in a neuron. 


Thus a prediction signal represents a barrage of spikes. 


195 





section 8.3 Summary of the Theoretical Proposal for the 
Neurophysiclogical Process of Learning in 


Living Organisms 


We have seen that an outstar network composed of embedding field 
elements is capable of learning. The key to this ability is the z 
process at an arrowhead. The z2 process at an arrowhead correlates the 
prediction signal arriving at the arrowhead with the x process at the 
adjacent node. ie ereneare this correlation in its amplitude and 
allows prediction Signals to excite the adjacent nodo proportional to 
its emplitude. By writing dow the equations governing the enbedding 
field network showm in figure 8.2.1, we can see this clearly: 
eo. 1 x(t) = - ox, (t) + P(t) 

-axp(t) + Po(t) + pay p(t)[x,(t -7) -T, ae 


if 


8.342 x, (t) 
8.3.3 Byoit) = -uz,,(t) + v[x(t -t) - BI f(t) = ry 


ae 


where: 
Vela yo OC 
ty” = 
Oat y = 0 
From the x(t) trace is figure 8.2.1, we can conclude that Zs ot) has 


already learned that V, and Vo are associated. That is, 24 a(t) ? 0 and 


1 
is of sufficient anplitude to result in a well learned prediction response 
by x(t). 

In order for the interneuron in figure 8.2.1 to excite the noto- 
neuron with spikes, there must be transmitter substance in the synaptic 
clefts, If we make the amount of transmitter substance released by a 
barrage of spikes proportional to p2, (t) [x (t a Ty" then the 


equations governing the embedding field network could accurately 


116 





describe the nervous network, If we further made the amount of trans- 
mitter substance available for release proprotional to the amplitude 
of Zi oft), then equation 8.3.3 could describe how, why, and how much 
transmitter substance is available in the synaptic cleft. Grossberg 
has proposed this as a concrete theoretical explanation of the neuro- 
physiological]. phenomena underlying learning in living organisms, His 
proposal is that transmitter substance is produced in a synaptic 

cleft at a rate proportional to the correlation of the frequency of 
spikes arriving at the bouton and the membrane potential and/or eelone 
frequency of the adjacent cell body. He has proposed additional re~ 
finements and an exact mechenism which gives this result in reference 
4, 

It is doubtful that the ability of an interneuron to excite a 
motoneuron in the spinal column of vertebrates is learmed. As we have 
said, the neurons selected for ee 8.204 were selected because of the 
extensive information that has been collected on them. However, the 
arrangement of neurons in the medulla, cerebellum, and cerebrum of 
vertebrates is similar and we do know that learning occurs in these 
orgens, The similarity between the embedding field network and the 
nervous network in figure 8.2.1 is uncanny. Grossberg has shown 
theoretically, and we have shown experimentally, that embedding field 
networks can learn, Thus Grossberg's proposal could explain learning 
in organisms at the microscopic level. The proposal is even more 
attractive when it is recalled that embedding field theory originated 
Pe a model for the macroscopic psychological phenomena of learming,. 

This thesis originally intended to simulate Grossberg's proposal 


in detail and canpare it to existing neurophysiological experimental 
. fey, 





data. However, the time was not available. A simplistic stab was 

made in this direction. the reason was that nervous Petco are 
capable of transmitting a signal throvgh a cascade of neurons without 
“pulse lengthening" occuring. To solve this problem in an embedding 
field node cascade, an att enpulee made to model the embedding field 
elements more closely to neurophysiological elements. At the same time, 
atcempts to implement logic i, of table 7.4.1 in an outstar were being 
made, The simplistic model of neurophysiological phenomena proved to 
be an A. logic. Because of these diverse reasons, the simplistic 
model arrived at in this thesis is quite different from Grossberg's 
proposal. In the next section we shall derive this model in a somewhat 
Logical manner, The reader may be assured that this was not the 


historical progress of the model. 


198 





section 8./+ A Simplistic Model for the Neurophysiological Phenomena 


in a Nervous Network Based on Enbedding Field Theory 


eee that we had two neurons, "5 and ve arranged as in figure 
8.2.1. Suppose further that excitements of the first neuron, Vy, 
only results in one spike being generated per excitement. Also 
suppose that we could excite the cell body of the second neuron, Voe 
with an input. <As in embedding field theory, we are not concerned | 
here with how Pe inputs are delivered to the cell bodies. For the 
sake of Bice suppose that transmitter substance ‘iis produced in 
the syneptic cleft at a rate proportional to the correlation between 
the membrane potential of a bouton and the membrane potential of the 
adjacent cell body, wx For the purposes of this discussion, we will 
assign a value of zero to the resting membrane potential at the bouton 
and let x, (t) be the membrane potential of the V, cell body. Let 
ust) be the "amount" or "concentration" of transmitter substance 
present in the synaptic cleft. From ovr previous work we have a choice 
of two formulations for ay ft) 
8.4.4 z(t) = “uz, (t) + ee ae ~% )x,(t) 
and the niore general formulation: 
BuH.2 (bt) = un, (t) + v $01, + 245(t)) 

[a(x (t -t) + x,(t))® = boy (t -t) - x, (t))*] 


Now, we run into a problen. x, (t -t) and x(t) are voltages, 
Bost) is the rate of production of a chemical transmitter substance. 
What are the chemical reactants which produce the transmitter substance? 


How does it come about that a chemical substance is be‘ing produced at 


a rate proportional to the product of voltages? 


a 





In our brief description of how membrane potentials cone about, 
we saw that these potentials are due to changes in the ionic perme- 
ability of the neurons’ membranes, Suppose that ah jon or substance 
diffuses or is released from the bouton when the membrane permeability 
is changed by arrival of a Spe We will call this substance “B" 
substance. Suppose furthor that a different ion or substance diffuses 
or is released from the ccll body when its membrane permeability is | 
changed by an EPSP or spike, We will call this substance "C" substance. 
Suppose further that "B" substance and "C" substance are the reactants 
which produce the transmitter substance, Since the transmitter substance 
results in excitation of neuron Vos we will call it “excitory transmitter 
substance", or simply “E" substances 

How do the B and C substances combine to produce & substance, 
and why would the rate for this reaction be proportional to the product 
of voltages? The rate of reaction for biochemical reactions may be 
governed by many things, including Patel Due to the complexities 
of biochemical processes, we could blatantly assume that the rate 
of reaction for the cozbination of B and C substances into E substance 
is proportional to the product, or the squares of the sum and difference, 
of two voltages. However, we need not make this blatant assumption. 
It is possible to allow B and C substances to combine according to a 
very Simple chemical reaction and this will result in all the desired 
properties for production of E substance, The remainder of this section 
will be devoted to this simple chenical reaction and its implications, 

Let B and C substances combine to form E substance according to 
the chemical reaction: 


oe. 3 bBteC = LE 
200 





where b is the number of moles of B and ec is the number of moles 
of C required to produce ono mole cf E, 

Let this reaction occur instantly at body temperatures, That is, 
if b moles of B and ec moles of C are released into the synaptic cleft 
at tine top then at any time t7t, only the end product of ono mole 
of E will be present in the cleft. 

We will investigate the implications of equation 8.4.3 fox the 
production of E substance. The investigation will involve a nunber 
of tricky conservation of reactants and end product equations, For 
simplicity , we will make b =c = 1 in equation 8.4.3, That is: 

8.4.4 TBti-C = AE 

Equation 8.4.4, will be used throughout. However it must be kept 
in mind that equation 8.4.3 is the general situation and that we will 
be investigating a special case. 

Let b, (t) be the number of moles of B substance released fron 
a bouton into the synaptic cleft per er Let Cy o(t) be the number 
of moles of © substance released from the cell body per second into 
the cleft. We can relate b(t) and aaa to the membrane potentials, 
x,(t -t) and x,(t). 

The biochemical process which results in membrane potentials is 
the selective permeability of the membranes, A positive increase in 
a membrane potential is due to an increase in the membrane's permea- 
bility to sodium ions, Nat. A decrease in membrane potential is due 
to a decreased permeability to Nat ions, As we discussed in section 
8.2, the net effect of a spike or an EPSP on a neuron is a slight 
increase of Nat ions inside it and a compensating decrease of potassium, 


K+, Now, suppose that B and C substances are held inside the membrane 
| 20] 





when it is at rost potential. Suppose further that they diffuse 
through the membrane with Kt ions to compensate for sane sncrease of. 
Nat jons. Sinca Ket jions diffuse out of a membrano when the membrane 
potential is decreasing, we can say that: 

(a) b(t) 7 0 when x, Ct -~ y< 0 

(b) 0, (t) p QO when X(t) £0 

Since tho rato of diffusion of Kt ions is proportional tc the rate 
of chance of membrane potential, let us go a bit further and say that: 

ce) BR) LR (Ee 
: 12 1 


e ° + 
(a) ¢, ft) (nx, (t)1 . 


u 


I 


where: 
+ Views 


Cy] = 
: 0if y £0 

In other rae this says that thse rate of release of B and C 
into the synaptic cleft is directly proportional to the rate of decrease 
of membrane potential, - 

Now, what happens to tho B and C substance aren they are released 
into the synaptic cleft? If both are being released at the same tino, 
then E substance will be produced. This exactly what we vane iy 
says that E substance will be meee if both x, (+ -) and x(t) 
are decreasing at the same tine, Although it ignores the increasing 
leading cdge of x, Ct ~%) and x(t), it does correlate the decreasing 
trailing edges. Further, this process corresponds to known physical 
iets. That is, when membrane potentials are decreasing, at least ons 
substance from inside the membrane in diffusing ovt of it, 


However, thore is a catch, Suppose & spike has excited the bouton 


‘recently, but no EPSP or spike has excited the adjacent cell body, 


202 





Then B substance will have been released into the synaptic cleft and 
there will be a net amount of it present for all time after arrival 
of the spike at the bouton. Thus, if a few days later, the adjacent 
' cell body is excited by an EPSP or a spike, © substance will be pro- 
duced. In embedding field er the association Vn will be 
dearned. One of the koy tenets of embedding field theory is that 

ld c can only be learned when Vy and ee been excited in close 
temporal proximity. Thus, we career allow excess B or C substances 
to accumulate in the cleft. 

There are three methods of preventing excess B or C substance from | 
accumulating in the cleft, It can diffuse out of the cleft, it can be 
readsorbed into the bouton or cell body from which it came, or, it 
can be rendered inactive by chemical reactions, There is no reason for 
prefering one of these methods to another here, We will arbitrarily 
choose the chemical reaction ana say that B and C substances are de- 
activated at a finite rate to prevent accumulation, 

Lot b, (t) be the nunber of moles of B in une elett at time t . 
Let e, (t) be the number of moles ef C in the cleft at time t- 

Then an say that: 
8.4.5 : (t) = -2 by (t) 
B46 o, (t) = 8, (t) 

We now heve « "correlating" processe Tho amount of E substance 
mrethe cleft, a (t), will grow when a spike excites the bouton in 
close temporal ore to the excitement of the adjacent cell body. 
It will not grow if they are not:in close temporal proximity. 


We must now dovelop a mathezatical description of tho production 


of E substance in the cleft as a function of the membrane potentials 
BOs 





x(t) and x, Ct -~). Thus far we have reached the following results: 
Bel. 7 { eaR ai eherC eee (instantaneous rate) 
ot. 
reece 
: + 
eae 


8.4.8 } 
b, ft) 


1 


) Belt. : 
be9 he 


8.4.10 By aft? = eae 
8.4.44 Cy ft) = oes 2 


wheres 

x, Ct -~t) is the membrane recent cam of the bouton. 

x(t) is the membrane potential of the adjacent cell body. 

Bett) is the number of moles of B released into the cleft from the 
bouton per second, | 

c a(t) is the number of moles of © released into the cleft from the 
cell body per second, 

b (t) is the net number of moles of B in the cleft at time t ; 

c(t) is the number of moles of C in the cleft at timet . 


ie ' 


Because of 8.4.7, either b,,(t) orc 


so8*? is zero at any given 


time. Also because of 8.4.7: 


8.4.12 z(t) = [min(d,, Gig S e, se)” 


where: 
a Sa x oy and > oO 
[ min(x, e Neat ys = ren aaa 0 
O4f x= 0 OR. y= 0 


This simply says that if there is b, ait) of B substance in the cleft, 
and we release c (t) < » (e) Gisee _ into the cleft, then 
instantaneously aon e 7 will be used up to produce E, 2, (t) is 


restricted to be postive because there Simply can not be a negative 


number of moles of B or C in the cleft, 


Zot 


Equation 8.4.12 will deseribe Za oht) as a function of x, Ct -~T) 
and x(t) if we can dovelsp equations relating b,,(t) end ee to 
x, Ut -T) and x(t) respectively. Let ws consider-the conservation 
of B in the cleft: 

(i) bot) zs [-x, (t ~v) “at of B substance is released inte 

the cleft per time interval dt, 

(ii) [ min(b,(t), ce, (t)) ] "at of B. substance is converted to 

E SSIDETR SAG. Sencheinteen Gear in 
time interval dt. 

(iii) The reaction to produce E substance is instantaneous. 

Therefore if there is b,,(t) of B substance in the cleft 


and Cy (t) of C-is added at t = t,, thon at t > t. there 


0 0 
ean be at most [b,,(t) - eft) + of B in the cleft, 
This is the amount of B substance which will be available 
for deactivation. Thorefore there is: 
e 7 + 
“Wp ec) - Cy (t) J at 
of B substance deactivated per time interval dt. 
Therefore: 
< : = = a ci 
8. e aoe ee -% “ = = 
413 bf) Lax, (+ )]" ~ wl do (t) c, .(t) ] 
° a \ ay o- 
[min(b, .(t), ¢,,(t)) J 
Similarly: 
ok 
4. + ~ = - - + 
8.4.14 = [-x -wW = -~ [mi 
4 ¢, (t) [ x(t) w Lo, .(t) b, ,(t)1 [min(b, ,(%), ¢, (t)] 
Equations 8.4.13 and 8.4.14 coupled with 8.4.12 completely deseribe 
the process whereby the voltages x, -%t) and x(t) are converted into 


the chemical substance FE, As they are rather complicated, a systen 


diagram was drawn and is shown iin figure 84,1, 


nos 





2 oy TE Sscins EA 
JO NOILONGOYd Yor WVYIVId WILSAKS 


206 





THE FIN OIF 
V3409 DEA VMAS 











1) 20 NOILVALLOV IO G 40 NOILWAILovaa | - 
ANVUEWaW | 
£009 | ’ INVUSW SL 
1139 | {4-21 [2 aang 14 7 


(2-49 ] 
i. 


y+ 


yaonaoud ars CE: Nese =A) | (2-9)'% 
4 ! 


£04 (2) Z = [ota yum] 
i] : | NOLNOG 


Vy 





A signal from one neuron is transmitted to another by the release 
of transmitter substance in the synaptic cleft. Having developed a 
model for the production of transmitter Eaten ac must not model 
how this substance is used in the transmission of signals, Let us 
assume that the transmitter substance produced by our reaction is con- 
tained in the vessicles in the synaptic cleft. Under normal circun- 
stances, it is safely packaged in these vessicles and unable to affect 
the permeability of the adjacent cell body membrane. However, when 
a@ spike arrives at the bouton, the vessicles suddenly burst and the 
meaenitter is released to attack the cell body membrane. How does 
the spike cause the vessicles to burst? 

Again since we are dealing with a biochemical system, there is no 
obvious method, Let us consider the events associated with the arrival 
of the spike at the bouton and see if there is any reason for the 
vessicles to burst. Arrival of the spike at the bouton begins with a 
rapid diffusion of Nat ions into the bouton, Here we have two possible 
reasons for the vessicles to burst. Firstly, before the arrival of 
the spike, the bouton and the cell body are at zero potential to one 
another, When the spike begins to arrive at the bouton the potential 
of the bouton rapidly increases relative to the potential of the cell 
body. Thus we could conceive of the vessicles being pulled apart by 
electrostatic PERCE: This would require dipolar vessicles. One end 
of the vessicle would have to be at a different potential with respect 
to the ater end. If transmitter were released by this method, then it 
atid nost Jikely be released before the spike peaks, 

On the other hand, we could conceive of the vessicles bursting 


due to the sudden infusion of Nat into the bouton, The detailed 
Coy 





mechanism would reauire that the normal Nat concentration in the syn- 
aptic cleft be greater than that inside tne bouton as is the case 

eh the interstitial fluid surrounding the neuron. Then the beginning 
of the arrival of a spike at the bouton would cause the Nat to. diffuse 
out of the cleft into the bouton. Since the volume of nie cleft is 
small compared to that of the bouton, this process would rapidly deplete 
the cleft of Nat. If sodium is required to keep the vessicle tugether 
they would como apart when a spike arrives at the bouton. Another 
mechanism that would have the same result would be to surround the 
yessicles with a membrane that is permeable to Nat and H,0. Then the 
sudden depletion of Nat in the cleft would also deplete the vessicles 
of Nat, The result would be an osmotically compensating insurge of 
H,0 snto the vessicles., With sufficient Nat depletion, enough H,0 atl a 
enter the vessicles to burst them similarly to hemolysis in red blood 
cells. (Ref. 12 , pel3) Again this method would release transmitter 
most likely before the spike peaks. 

We could conceive of other mechanisms to cause vossicles to burst, 
However, we have two likely candidates which cause them to burst before 
the spike peaks. Our process for the production of transmitter begins 
to operate after the spike has peaked and begun to decay. 

If the process which releases transmitter operates at the same 
time as the production process, we will be releasing the transmitter 
that we produce. Thus, to make our system work well, we must separate 
the transmitter release and production process. For this practical 
reason, and the fact that it could work, we Will release transmitter 
when the bouton membrane potential is increasing, That is, transmitter 


' will be released when: x, (t -tT)7 0 
208 





Jo must now decide how much transmitter is released, For simplicity 
Jet us assume that all the transmitter in the cleft is released when 
the bouton membrane potential begins to increase. ‘We will further 
- assume that all the released transmitter immediately changes the per- 
meability of the adjacent cell body menbrane and results in on immed= 
sate increase in the cell body's membrane potential. Note that this 
imehies that arrival of the spike at the bouton causes an impulsive 
excitement of the adjacent cell body. 

We need to Gceide one further thing. Release of one mole of E 
substance will result in a cell mewbrane potential of how many volts? 

We will arbitrarily sat that release of one mole of E will result 
sn a cell body membrane potential increase of a volts. 

In summary, our transmitter releasing process does the following: 
Suppose that there is 24 6b) moles of E present in the cleft. Then any 
snerease of the bouton membrene potential above resting potential. will 
cause the release of 24 (tb) moles of E. This will in turn cause an 
immediate increase in the adjacent cell body membrane potential of 
az, .(t) volts. The E released is used up causing the x, (t) membrane 
potential to increase. Thus Zs ot) = 0 immediately after release of 
the E, 

In order to use this simplistic model for the production and release 
of transmitter’ in the synaptic cleft, we must also model the membrane 
potential responses of cell bodies and axons. At the beginning of the 
modeling process, we said that we were only interested in the propagation 
of a singlo spike across the synaptic cleft, Our model. for the membrane 
response at other parts of the system thus need only account for a single 


spike. Rather than going through the laborious process of finding 
207 





processes wnich will exactly duplicate the membrnae potential traces 
shown in figure 8.2.1, we will adopt the formulation for " processes 
at a node in an embedding field. Further, we will’ not consider thresholds 
in this study. 

With these assumptions, suppositions, and modeling ene we 
are in a position to write down a complete set of equations governing 
this simplistic model for a nervous network. We will summarize the _ 
notations used and then write down the equations. 

The equations amd notations will be presented in a generalized 
form, Since this is just a reformulation of the embedding field net- 
work equations, we will number the cell bodies in a nervous network 
and refer to them as the "Vs" cell body. AJl1 synapses between boutons 
connected to the V5 aia body by axons and the ues cell body will be 
referred to by the dual subscript ij. The first, i, subscript shows 
the direction a signal is coning from and the second subscript shows 
the direction it is traveling toward nCroee the synapse. 

Chemistry: 

} B substance is a chemical substance released from a bouton into 
a synaptic cleft when the bouton's membrane potential is decreasing. 
| C substance is a chemical substance different from B substance 
which is released from tne cell body into synaptic clefts when the cell 
body membrane potential is decreasing. 

E substance is excitory transmitter substance. It is produced 
by the instantaneous reaction: 

eB tC < aE 
At all times when the bouton membrane potential is at resting potential 


or decreasing, the E substance is stored in the synaptic cleft and is 
210 





unable to affect membrane potentials. When the bouton membrane potential 
is increasing, all the © substance in the cleft is immediately released. 
When it is released it inmediately caused an increase in the adjacent 
cell body membrane potential of a volts per mole of E substance released, 
The E substance releasee is used up causing the cell body membrane 
potential to increase. 
fe ebies: 

P,(t) is an input signal delivered directly to the V, cell body 
from the environment. 

x; (t) is the cell body membrane potential of the Vs cell body 

in the nervous network. 

x, (t -) is the membrane potential of the boutons connected to 
the V5 coll body by axons, 

2s 3(t) is the number of moles of E substance present in the synaptic 
cleft between boutons connected to the We cell body by axons and the 


' 


LS cell body. 

b, 5(t) is the net amount of B substance in moles in the ij synaptic 
cleft at time t. 

5 4(t) is the net amount of C substance in moles in the ij synaptic 
cleft at time t. 
Constants: 

O is the decay rate for membrane potentials, 

Wp, is the deactivation rate for B substance in a synaptic cleft. 

W, is the deactivation rate for C substance in a synaptic cleft. 

a is the released transmitter effectiveness factor on a cell 


body membrane potential, One mole of E substance released in a synaptic 


cleft results in an increase in the adjacent cell body's membrnae 
2 ti 





potential of a volts. 
vT is tne interval between origination of a spike at a cell body 


and its arrival at the boutons attached to that cell body by axons, 


The vations governing the system's performance: 
eg & oD od p 


8.4.15 x,(t) = +ax,(t) + ata #2 Bgl ~%))2,.(%) 
where: 


R(x s(t ~ T))e, ,(t) 
is a special dl defined by: 


R(x.(t ~v))z..(t) = 
J Ja an basic of amplitude zs ,(t) when Xe 5it -t) 


70 
= = so 
’ = snl : os . - 
8.4.16 z(t) [min(b..(t), e,,(t))] R(x At T))z 5) 
‘where: 

ie et oG = ayeone x > 0) 
[min(x, y)] =4 y if yS x and y> 0 
Ora, KemOn ory .< 0 


84,17 b(t) ere lee wb at) - e, ty)” 
ja j 7 . 
ain ft 04. (t))] 
where: 
B te > oa) 
| Ocean 


8.4.18 C, (t) = [-x, (t))* ” wl, (t) -B,,(e) _ [min(b,.(t), 
c..(t)] 
Cas 


Zane 





section 8.5 Experiments with the Simplistic Neurophysiological 


Yodel 


Equations 8.4.15 through 8.4.18 look formidable. They were 
Simvlated on a digital computer and it was experimentally verified 
that they work, A simple network consisting of one neuron, Vas synap- 
sing on another, Voy was used, Since the method for the excitement of 
a cell body by transmitter substance is an impulse, the external 
inputs P,(t) and Po(t) were specified to be impulses of amplitude - 


10, The remaining parameters were selected arbitrarily to be: 


K = 3.3333 sec. 

T = 0,3 sec, 

a 0° 

aa OV5 

e-= 1,0 : 


Figure 8.5.1 shows the results, The impulse input Po(t) Was 
presented to cell body V5 exactly at the instant that the first spike 
x, (t ~t) arrived at the 1,2 synapse. Thus the signals x, (t ~vt) and 
x, (t) exactly correlated, Therefore the amount of B substance entering 
the cleft per second was exactly equal to the amount of C substance, 
Thus all the B and C substance was used up instantly to produce E as 
is shown by the zero b,o{t) and eo (t) traces. The amount of E produced 
was exactly enough to cause x, (t -~t) to exactly correlate with x(t) 
on the second response, Again all B and C was used up producing E 
and the amount of E produced was the same as before, 

Since all the B and C substance was used up instantly to produce 


E, we can analytically compute the traces in figure 8.5.1. The response 


213 








+E 


Xo(t) 


OTYSTTduTS s4yy 4594 





‘Tspou TeoTsoTotsséydoineu 


04 queutazedxe oTduts y 


*“[°¢°g seandty 


TIME 


214 





of V, to an impulse is: 


a te 
x,(t) = oem © ~ °-1 | 


FOr = Oy deesce. 
Allowing for the transmission delay between cell body v and the bouton, 
the bouton membrane potential is: 


-/- 
= ea(i\rae 
e a v 0. J for i = 0.4 56C. 


x, (t -t) = 10 
The response of V, to the impulse P(t) is: 

x, (t) = Lom EL U-0.47" for t = 0.4 sec. 
The amount of B entering the synaptic cleft is: 

by p(t) = [ox (t -7)J * = 40a e7 KLt-0,477 fort — O.4esmc. 
similarly, the amount of C entering the cleft is: 

@, 9(t) a [-%,(t) sot t-0.4)7 


Since all the B and C entering the cleft is instantly used up 


10“ for t = 0.4 sec, 


producing E, 
Z4o(t) = b, .(t) = ect ) 
or: 


t = x t-0,4)" 
24 o(t) = by 10 %c dt 


s = = ea 
= 401 - e «[ t-0.44] , 


which is exactly what figure 8.5.1 shows. 

The second x(t) response is due to release of E by the sudden 
increasing leading edge of x, (t -t). This results in the instantaneous 
release of ail E in the cleft as is shown by the z trace. From equation 
8.4.15, the release of the E results in an instantaneous increase in the 
amplitude of x, (t) to a value of at, o(t). As a = 1,0 and 2, (t) = 10.5, 
x, (t) suddenly jumps to a value of 10.5 as shown. When the sharp 
increasing leading edge of x, Ct -%) is over, no more transmitter is 
released and the production process begins to produce E substance, 

The amplitudes are the same as in the first response and the same 


. amount of E is produced again. As Long as the amplitude of the impluse 


a 





exciting V, is kept at a value of 10, the same traces will be produced 


1 
for as many excitements of Vy as we desire, We will analytically 
Rv this statement shortly. | | 

-If we consider the traces x, (t), x, Ct ond x,(t) to be spikes, 
then the assumption that the input impulses amplitudes will remain 
constant is realistic. Spikes are always of the same amplitude and 
duration in a particular species of neurons. Note that once the 
transmitter substance was formed, arrival of the spike x, Ct ~t) at 
the bouton had the same effect as an input impulse on x5 (t). Thus we 
may consider that our input impulses, P, (t) and P(t) are the effects 
of spikes arriving at boutons synapSing on Vy and Vy which already 
have 10 molar units of E substance present in their synaptic clefts. 

Figure 8.5.1 shows the result of the special case of & spike 
arriving at the 1,2 synapse at exactly the same time that Vy is excited 
by an input impulse. To check the ability of these networks to learn 
when the input impulse to a cell body is delivered at a time different 
from the instant that a spike arrives at the synapse, another experiment 
was performed. One cell, vo was arranged so that it synapses on 
5 other cell bodies in an outstar arrangement, The parameters in the 
network were kept the same as in the previous experiment. Figure 8.5.2 
shows the arrangement of the neurons and the results. 

The amount of B substance in the clefts, b,.(t), was zero at all 
times. This ss because the deactivation rate for B, ws was infinite. 
In the simulation, it was considered that the amount of B entering 
the cleft in an infinitesimal time interval, dt, was made available 


to react with any C present to form EF, Jf there was any plete aover 


after this reaction, it was immedietely deactivated before any more 


216 


all 


Zo 


TIME 


io Z_ (ft 
a ae 
Z. oft) 
[oO = 
10 — Lali) 


mee 3 
TIME 


Fipure &@.5.2. A more complex exreriment ith the simnlistic 


neurophysiological model, 





B entered the cleft in the next infinitesimal time interval. 
Nevertheless, figure 8.5.2 is a good look at the processes going 
on in this model. The c.4(t) and eo{t) traces show the instantaneous- 
ness of the EF production reaction. ME was SEE TSS a) an input impulse 
before the spike from va arrived at the boutons. Thus © was released 
into the c,1 synaptic cleft and began to be deactivated. When the 
spike arrived at the c,i bouton, B was released into the cleft. Since 
there was more B being released into the cleft than there was © present 
in the cleft, all the C was instantly used up producing E, Thus 
c(t) suddenly drops to zero when the spike arrives at the c,1 
bouton at t = 0.4, However, enough of the C released by V, had already 
been deactivated when the spike arrived to allow z oq ft) to rise to 
a value of only 5. 
The traces associated with Yo are exactly the same as those associ- 


$ 


ated with V5 in the previous exporiment. The spike arrived at the 
¢,2 bouton at exactly the same instant that the PoC) input impulse 
was delivered to wee Thus all the C and B released was used up producing 
Ee 

The traces associated with Voi Vis and Ye show what happened when 
the input impulses are delivered to the cell bodies after arrival of 
the spike at the boutons. Because of the infinite deactivation rate 
for B, there was no accumulation of B in the cleft. Thus only the amount 
of B entering the cleft when these cell bodies were escited is available 
for reaction with C to form % Remember that the B entering the 


cleft is: 


° : - 
b,4(t) Sl ee 





and the C entering the cleft is: 

q(t) = [-4,(0)] " 
V_ was excited by the input impulse Pit) at time t = 0.5. The spike 
" arrived at the bouton at t = 0.4. Because of the infinite deactivation 
mate for B, all the B which re the cleft before t = 0.5 was 
deactivated instantly. Thus the B available for reaction with the C 


which begins to enter the cleft after t = 0.5 is: 


; Z noo eae Re 
oe = [-x (t ee) = 10%e e aft-0. 5] Lor u = Ces 
The C which. enters the cleft after t = 0.5 is: 

e e + ~ —_ as 

e3\t) 2 [x(t ) J = 10a[e aft 0.5) POU t=) 0.5 sec, 


Thus, the amount of C entering the cleft is a factor of sees 
Pest er than the B entering the cleft. The reaction 1-B + i:CSi-E 
is instantaneous and the coeficients of wiity mean that Lmin(® (t), 
43¢t) 1” of B is converted to E immediately upon entering the 
cleft. Since bg3ht) is less than 43(t), all of the B is converted 
to E, Knowing this, we can analytically compute the amount of E 
produced: 

Zag(t) = [min(b (4), & ())] * = BAC) 

This last conclusion is a technical point. Since all the B entering 
the cleft is immediately used up, there can be no accumulation of B 
and b,4(t) is technically zero. However, in an infinitesimal time 
interval, dt, bi g(tdat of B did enter the cleft. We must hypothesize 
an infinitesimal accumulation of B in the cleft of: 

b(t) = by a(tat 
Since db 4(t) < C5(t) at all times, the amount of E produced during 
the time interval dt is: 


dz, (t) = ea! = bagltat 
: eu 





Thus 2 (t) = b(t). 
us Zo! ) 03} ) 
The E produced at any time t 7 0.5 is: 


(t) =f 


2 + 
ays [-x (t =e i enche 


: t 
Byglt)dt = Jo. 


z% 
5) 

| -O.1% .oft-0,5]* 

€ e ee cu 


ip 
= J 9, «tO 


; 
406700 1% (Gah ae e x(t-0.5 ] 


" 


For times sufficiently greater than t = 0.5, the E producsd is: 
-0,i0 


Zo3\t >> 0.5) = 10e 
for % = 3,3333, this gives us: 
243 (t aCe) 762 
which agrees very well with the experimental reaults shown on the 
Zo3(t) traces in figure 8.5.2. 
Since there was more C than B entering the c,3 cleft, and since 
® Was deactivated at a finite rate, there is an accumulation of C in 
tececlett, The Cg3(t) trace shows this accwnulation and its deactivation. 


The traces associated with Vy and Ve are similar to those associated 


with Vas The only difference is that Vy and V._ were excited by input 


» 
impulses at progressively later times than Ve 


The second response shown on all the traces is a "prediction" 


response. The command cell body, Me was excited by an input impulse 


alone. The spike so gencrated traveled down the axons to the "grid" 


cell bodies, in 


the transmitter E substance in the synaptic cleft. Each of the “grid" 


through V_. When it arrived, it instantly released all 
oo 


cell bodies was excited to a membrane potential of 256 =e een 


this case, there was no time difference between the arrival of the 


220 





spike at the boutons and the excitement of the grid cell bodies. 
Both events occured at t = 2.2. Thus the amount of B being released 
into the clefts which could react with © to form E was: 
vat Brae 2a 
met = eax Chee t) | = rene He Teele ee oe 
ci Cc ; | 
However, the amount of C being released into the clefts at the sane 
time was: 
; : + ~ oft + 
e (t) = fore (t= Aree c alt~-2.2 ] for t = 252 
ci Zz Ca 
In all cases, the emount of C being released was less than or equal to 


the amount of B being released. Thus: 


+ 
met) =az..(t = 2.2) 4 gr lt-2.2] for t= 2.2 
cL C1 
Ors 
- t ~a[t-2.2|" 
24 6t) = az, (t : 2.2) J) ae dt = 


+. 
4% 4% ee?) dees or wlt-2.2] ) 


for t sufficiently greater than t = 2.2: 


4 


Bog(t >? 262) = az, (t = 2.2) 
which is what the Z446t) traces in ae 8.5.2 show. Note that the 
effect of a prediction excitement of the grid cell bodies is to produce 
the exact amount of E after excitement as there fas before the excitement. 
In this sense, the network is self-sustaining. We can continue to excite 
the grid cell bodies with prediction spikes for as long as we want. 

The result will be the same as the prediction response shown. 

Because the amount of B being released was always greater than or 

equal to the amount of C being released during the prediction excitaicnt, 


there is no accumulation of C in the clefts, The eft) traces are 


therefore zero during the prediction excitement. 


Daa 





section 8.6 Inhibition and an & 3, Logic 


We now have a simplistic model of a nervous system that is a 
synthesis of some nevrophysiological facts, some assumptions, and 
embedding field theory. Although much thought went into the modeling 
process, we can not pretend the model is accurate. The fact that the 
model does work is a powerful argument for a deeper study of the 
embedding field theoretical assumptions sondewnane learning at the 
microscopic level in living organisms. 

The time was not available for that deeper study. Shortly, 
we will drop the neurophysiological names that have been attached 
to the elements and processes in this model and consider it to be 
an embedding field network only. Before we do so, there is one further 
neurophysiological phenomena which occurs ‘in nervous systems, At the 
microscopic level, inhibition consists of depressing the cell body 
membrane potential below the resting potential. Figure 8.6.1 shows 
a conmon inhibiting arrangement in the spinal column of vertebrates, 
The two large light neurons are motoneurons, The dark neuron is a 
Renshaw cell, The sequence of events shown on the traces is as follows: 
The cell body of motoneui'on V5 is excited by a spike. Its membrane 
potential rises with an EPSP and a reflected spike. This spike is 
propagated down V,'s axon. <A collateral breaks off of this axon and 
synapses on the Renshaw cell's body. Arrival of the spike at this 
synapse excited the Renshaw cell body which fires a burst of spikes, 
These spikes propagate up the Renshaw cell's axon. The Renshaw cell's 
axon breaks up into two collaterals, One synapses on the V, cell body 


and another synapses on the Vo cell body. When the burst of spikes 


CitaG 





BURST OF areliies 


2245 


2) Le 


een 


MUST re rng 
Tee dare neurea 12 2 


- 
at 


- las 
ead 





4+ ne 
Gia 


enca 


Aeon on 
4 
The lirpht neurons 12° motoneurons, 


ete ale 
-alurn of ‘ver 


a 


Renshaw cell. 


a: on 
b. e 
2 ie 
hee or 
or hos 
fx, w? 








arrives at these synapses, inhibitory transmitter is released. The 
inhibitory vransmitter causes a decrease in the menbrenue potentials of 


V4 and V, below resting potential, The membrane potential traces which 


Z 
are below resting potential are called inhibitory post synaptic potentials 
or IPSP's, | 

The important things we want to note from figure 8.6.1 are: 

ie) The Renshaw cell's bocy membrane potential increases in the 
positive direction wnen it is excited. 

(b) The spikes propagated along the Renshaw cell's axon are 
similar to the spikes along the motoneuron's axons. In particular 
they are increases in the positive direction of the axon's membrane 
potential, 

(c) A transmitter substance is releasee by these spikes. It 
‘Causes a Gecrease in the motoneuron's cell body membrane potential, 

This decrease in membrane potential does not cause any change in the 
motoneuron's axon membrane potentials, 

These facts show that there is no negative membrane potential 
propagated anywhere in the system. All propagating signals are positive 
Signals. In the discussion of allowable prediction signal states in 
section 7.2, we did not allow the propagation of negative amplitude 
prediction signals. We made this restriction on the grounds of 
consistency and the fact that negative amplitude prediction signals 
were not needed in an outstar, In the nervous system of living 
organisms, negative amplitude “prediction" signals do not occur. Thus 


our restriction on the allowable states of prediction signals in embedding 


field networks is consistent with neurophysiological data, 


Aa 





The inhibitory transmitter substance released by the Renshaw 
cell's burst of spikes is considered to be a chemical substance that | 
is different from the excitory transmitter eaen excites the moto~ 
nevron's pe ilne cee There are at least three chemical substances 
which act as transmitter in nervous systems, They are acetycholine, 
epinephrine, and norepinephrine. In one part of the body, and with one 
species of ncuron, one of the substances may act as an excitory 
transmitter and another may act as an inhibitory transmitter. In 
another part of the body and with another species of neuron, their 
effects may reverse. 

With these few facts in mind, we will now invent a simplistic 
model for inhibition which we shall add to our previous model. Firstly, 
we Will postulate an inhibitory transmitter substance H which is 
different from our excitory transmitter substance E, Since the H and 
m may reverse their roles in other parts of the nervous system, we 
want the processes for production and relleses of H to be similar to 
those for E, Therefore we will assume that H substance is stored in 
the synaptic cleft. Jt is released when the adjacent bouton membrane 
potential is increasing, i.e., when x(t -v) > 0. We will further 
assume that the release of one mole of H will result in an instantaneous 
increase in the adjacent cell body membrane potential of 2% volts. 

Note that this is an increase of 8 volts. We have specified that the 
release of one mole of E will result in an instantaneous cell body 
membrane ptoential increase of a volts. By specifying a or 8 positive 
or negative, we can specify their effects in various parts of our 


system. However, normally %¥ will be assigned a negative value. 


Coe 


We must now invent a process which will produce H substance from 
chemical substances available in the synaptic cleft. To do this we 
aaa meal closely at the Renshaw cell bouton-motoneuron synapse ‘in 
figure 8.6.1. The effect of H substance is a decrease in the moto- 
neuron's cell body membrane potential. This is caused by an increase 
in the cell body membrane's permeability to Kt and Cl- ions. With 
the sodium pump working to eject Nat, the net effect is an increase 
of K+ ions inside the cell body. Remember that we allowed C substance 
to be released when the cell body membrane potential was above 
resting potential, but decreasing. When the cell body membrane potential. 
is above resting potential but decreasing, Kt ions are diffusing 
out of the cell body. Thus we have sort of tied the release of C 
substance to the diffusion of K+ ions out of the cell body. Now, 
when the cell body's membrane potential is decreasing below resting 
potential, Kt ions are diffusing into the cell body. Thus we may 
assume that no C substance is being released into the synaptic cleft 
when the cell body membrane potential is decreasing below rest potential, 
We will make the further assumption that no C substance ‘is released 
at any time when the cell body's membrane potential is below resting 
potential. Thus C substance can not be involved in the production of 
H substance. We could postulate another chemical substance which 
is released from the cell body into the synaptic cleft when the cell 
body membrane potential is below resting potential. This is a valid 
option, but we will not investigate it further. 

Since the Renshaw cells’ spikes are the same as all other spikes, 
B substance is being released from the Renshaw cclis' boutons. Thus 


-B substance could be a reactant in the production of H. Suppose that 
226 





there is a substance, 5, which is always present in large quantities 
in the synaptic cleft. Suppose further that a substance reacts with 
B substauce according to: | 
ore.1° 1-B+ti-eS = i-H 

Suppose further that this reaction is fast, but not as fast as 
the reaction producing E substance, Then excitation of a bouton with 
a spike will release B substance. If there is C substance precent in 
the cleft, then [min(b(t), e(t)) of E substance will be produced, 
If there is any B left over after this reaction, it will combine 
with S to form H. In the experiments of section 8.5 we saw that an 
accumulation of B in the cleft is not necessary for learning. (The 
accumulation of B in those experiments was always zero because the 
deactivation rate for B, uw, was infinite.) Further, if we make this 
postualtion, then the logic governing the performance of the elements 
in the network will be an &. logic. 


3 4 


An 3 logic conforms to the following tabulation: 


Table ero, | 
xe ay d 3 5g x5) ~ 265 
0 0 ‘0 
0 +1 0 
+41 0 -1 
+1 $A. ra 
are, ~1 -1 
0 -1 0 


in the current context, this tabulation means that there is no 
transmitter substance, E or H, produced when the boutcn membrane potential 
is at resting potential, or x, = 0. This is independent of whatever 


the adjacent cell body membrane potential may be. HUowever, when the 


iar 





bouton membrane potential is above resting potential, there are three 
cases: When the adjacent cell body inembraouc potential is at or below 
resting potential, inhibitory H transmitter substnace is formed. If 

the adjacent cell body membrane potential is avove resting potential, 
but decreasing, then excitory I substance is formed. 

Thus the reaction 1*B + 1°S = i*H accomplishes one of the stated 
aims of this chapter - the implementation of an 4, logic. We will 
therefore adopt it as the chemical reaction producing H substance 
in the model. 

The alert reader may have noticed that we have already accomplished 
the third aim of this chapter. We have already invented a process which 
does not cause “pulse lengthening" of a signal being transmitted through 
& neuron er embedding field node cascade. Consider a cascade of N 


neurons, Vis “ei The oe 


N’ j-i 
" neuron is the "starting" neuron.’ Suppose that each of the j-1,3 


Synaptic clefts in the cascade contains A moles of E substance. Let 


neuron synapses on the Ys nevron. The 


the E effectiveness factor, a, be a =1.0. For simplicity let the H 
effectiveness factor, 6 , be %= 0, so that we do not have to worry 
about inhibition. Let the "starting" neuron, Vy be excited by an input 


impulse of amplitude A at time t Then: 


4% 
Or2or t < o4 

x, (t) = ae 
Ae aor a= ty 

This signal will arrive at the 1,2 synapse at ty + Vv, It will cause 

the release of all the E substance present. Thus: 


. OR ire austere 
Xp(t) = 


Nes" ore ty + 


Zee 





Since x(t) and x, (t -~7t) ara identical, A moles of E will be 
produced in tne 1,2 cleft by the E producvion process after t= t tT. 
The samc argument holds for soe pair of neurons , bee on in ane 
cascade. Thus: 

Orroar ti< ty + ( j=i) 


x.(t) = 
J het for tat, + (j-1)v 


1 

Except for the time delay, (j-i1)Vv , the signal is transmitted throvgh 
the cascade unchanged. There is no "pulse lengthening". Additionally, 
the self-sustaining property of the E production insures that we can 
propagate any number of signals through the cascade without distortion, 
(Note, this last statement is true only if there is a tine interval 
between consecutive signals which is large enough to allow the E pro- 
duction process to produce approximately A moles of E before the next 
signal is started at the "starting" neuron. In practice, making this 
interval 3/a seconds is sufficient.) 

The reason that such a cascade does not distort a signal is simple. 
The input signal to the "starting" neuron is an impluse. The "prediction" 
input signal to all the cell bodies in the cascade is also an impulse. 
This is because the effect of the release of A moles of E in a synaptic 
cleft is an instantaneous increase of the adjacent cell body membrane 
potential of A volts, The effect of an input impulse is an instantaneous 
ee of the cell body membrane potential of A volts.. Thus the 
effect of an input impulse and a "prediction" excitement are the same, 

Having modeled an arbitrary mechanism, we will now drop the 
neurophysiological names assigned to the elements and processes of the 


nodes and replacd them with embedding field names, To do so, we must 


add a "synaptic cleft" between the arrowheads of the enbedding field 
| Ce 


theory and the adjacent node. This is added to give us a definite 
place for tiie chemical reactions we have invented to occur, We will 
denote the synaptic cleft between the a arrownead and the V. node 
by O5 « Because cur model works according to chemical reactions, we 
will call a network composed of elements from this model a chemical 
embedding field network, We list here a complete description of the 
processes, There are several new variables in the following equations. 
They are defined after the equations. 
Equations for the chemical embedding field network processes: 
eVe $ —=_- + Fe + R ; ° as T ee + 
8.6.2 x(t) ax, (t) ,(t) i (p(t ))a.,(t) 
sER(p.(t -v))h,. (+) 
j J Ji 
where ett ~7))y(t) is a special function defined by: 
O if p(t -t)* 0 
R(P.(t ~7))y(t) = j | 
J an impulse of amplitude y(t) when Pecan 0 
8.6.3 p(t -t) = [x(t se) 


8.6.4  2,,(t) = [min(b,.(t), e,()) 7 - ROB(t = T))a5,(4) 


where: 


a x oan y and © oO 
[min(x, y)] “13 af = eesncey > O 
Oma x -< MO or vec 0% 
eee ee 2 at .; 
BO. 5 ht) = Ib, ,() - Lmint ,(t), o5,(t)) ] - Rpj(t- t) dh, (t) 
8.6.6 b(t) = [-B,(t -t)}" ~ [min(b,,(t), &5,(¢))] * 
where: 


Ly} = 


+ Vo coe 0 
0 Aaa VU 


: 2 —_——_— ,, -t a — + 
8.6.7 6 55(t) = [- PQ" ] - w (eC) - be] - 


[min(b,, () : e54(t)) 


230 





Definition of the variables: 

x, (t) is the convenvional x process wnuich occurs at node V,. 

Dr -t} is the prediction signal at the arrowheads connected 
to node Me by directed edges. Since we do not allow negativo amplitudes 
for prediction signals, pt -t) = [x,(t -~t) | - athe the first 
derivative, p(t ~ t) is used in the above cquations, 

P(t) is the conventional event input impulse. In this study of 
the chemical embedding field networks, P(t) will be constrained to 
be an impulse of amplitude A. 

2. (t) is the amount of excitory transmitter substance, E, in the 


me. Synaptic cleft. 


ju 
h,,(t) js the amount of inhibitory transmitter substance, H, in 


the S a5 


b(t) is the amount of B substance in the 5S.. synaptic cleft. 


synaptic cleft. 


gu 


c4,(t) is the amount of C substance in the Sas synaptic cleft. 


Definition of the constants: 

Q& is the decay rate for x processes, 

a is the effectiveness factor for E substance, Release of 1 wnit 
of E in the synaptic cleft will result in an instantaneous increase 
in the amplitude of the adjacent. nodes‘ x process of a. 

Bis the effectiveness factor for H substance, Rolease of 1 unit 
of H in the synaptic cleft will result in an instantaneous increase in 
the amplitude of the adjecent nodes' x process of 8. 8 will have 
negative values throughout the rest of this study. 

“ is the transmission delay due to finite transmission velocities 


on directed edges, A signal which originates at the i node at time t, 


> 


will arrive at the arrowheads connectcd to this node at time t. + 7, 
a 


W is tne rate consvcant for deactivavion of C substance, 


Discussion: 
| Peatons 8.6.2 through 8.6.7 are a mathematical description of 
the processes we have invented in this chapter. They are different 
from equations 8.4.15 through 8.4.18 because they include the addition 
of the inhibitory processes, 
(t -7))ng,(t) 


a 
in equation 8.6.2 say that when the prediction signal p tt -7) 


The functions a 2 RC p(t =% ))z.,(t) ana %2: Rp 
J ) 


arriving at the aa arrowhead is increasing, all the E and H substances 
in the oF synaptic cleft is released instantly. The release of these 
substances at time to causes an instant increase in the amplitude of 
the adjacent x, (t) process of az ..(to) +h, (to). 

E substance is produced in the synaptic cleft according to the 
instantaneous reaction: : 

{Bt icC SAE 
Because the unit cocficients in this equation, the maximum amount of E 
that can be produced at any time is the minimum of the reactants 
available. Equation 8.6.6 says that the sO of B being released 
into the a cleft per second is [-p,(t -r)]". That is, the amount 
of B being released from the ee arrowhead into the Oss Clety as 
directly proportional to the decrease per second in the amplitude of 
the prediction sienal at the a arrowhead. The B substance thus released 
is first mado available for reaction with C to form E, If there is 


any B lcft over after this reaction, it reacts with S suostance to form 


H substancc, S substance is always present in large quantities in the 


Zoe 





cleft. Equation 8.6.5 says this mathematically. 

Tho amount of C released into the = cloft per second is directly 
proportional to the decrease per second in the amplitude of the adjacent 
V, nodes’ Xx process, provided that x process is positive. The. term 

[- [x,(t)] a in equation 8.6.7 says this. The amount of C present 
in the cleft is first made available to react with any B present to 
form E. If there is any © left over after this reaction, it is 
deactivated at rate Wis Equation 6,6.7 states this mathematically, 

Although equations 8.6,3 through 8.6.7 are complicated and 
describe a complicated set of simultaneous processes, they are fairly 
straight forward to simulate on a digital computer. In the next 


section, we shall study an outstar network governed by these equations. 


Coe 





BeCuion O./ A Chemical Outstar 


An outstar composed of arene embedding ficid elements was 
set up. The standard experiment that has been performed in the other 
outstars studied was performed. The events pert to the nodes were 
specified to be impulses of amplitude A =10, From equations 8.6.2 
through 8.6.7, there are five network parameters to be specified: 

Baer, 2,6, and Woe & and T were specified as in the past: 
A= 3,333 secs? 
cc 


| 


We SEC. 
The deactivation rate for C substance, Woe was arbitrarily specified 


te be: 


i 


Wy = 0.5 SEC. 


Since an excitory transmitter (E) substance effectiveness factor 
of a = 1.0 has resulted in self~sustaining systems in the past, a was 
specified to be: 

oe 1.0 

The specification of the new inhibitory transmitter (H) substance 
effectiveness factor, 6 , will require some discussion. The chemical 
outstar conforms to logic & 3 tabulated in table 8.6.1, The three 
assignments in that table which can cause the "z" processes to be 


driven to non ambient states are: 


8.7.1 A(R =H, % =H) a4 
3 Cc a 

Se7ac f 46% winx, = 0) = =1 

On. 3 A 3% =) t= =i eet 


In the current context, 8.7.1 says that an excited prediction signal 


at an arrowhead and an excited x process at the adjacent node results 


BG 


in the production of E substance. This is equivalent to driving a 
"2" process in the excitory direction. The other two assignments say 
that when the prediction signal is in on excited state and the adjacent 
' node x process is in an ambient or inhibited state, H substance will be 
produced. This is cauivalent to driving a conventional "z" process 
in the inhibitory direction. In the last chapter, we introduced the 
idca that an outstar nay have its grid nad command nodes randomly 
excited before it “goes to school" to learn a pattern. According to 
the A, logic, random excitement of the grid nodes can not change the 
"2" process state. However, random excitement of the command node 
can result in the outstar learning to directly inhibit all the grid 
nodes according to assignments 8.7.2 and 8.7.3. In a real environment 
we can expect this 26 be the case before the outstar "goes to school". 
Thus the outstar will be inhibitorally biased before we try to teach 
it a pattern. We must insure that this inhibitory biasing is not so 
great as to prevent the outstar from ieee eye a pattern, 

To facilitate this discussion, we will prove the following lemma: 
Lemma 8.7.14 

Let a node V, have an arrowhead N 


1 12 
Let the fundtions s(t) = hy o(t) = 0, Let node V, be excited by a 


impinging on another node Vo. 


positive impulse of amplitude A, at time ty. Let node Vo be excited at 


time t, = t, + by an input impulse of amplitude A, which may be 


negative. Then the amount of H substance in the S,. synaptic cleft 


42 
at times t 727 uy ‘a AG ¢ 


2 
a OF O<A,< A, 
Meet 2) = een 0 <A of 
12 é Ar afA, £0 ° J 





And the amount of E substance in the oe synaptic cleft at times 


mo? t. + yr = b, Gs 


1 
7 \ Aa a6 0 <A 
Zyolt >>t,) =\Ap Af O< Ap < Ay 
. O3if Ap = 0 


Eroot: 


The input impulse to node Vy results in a prediction signal arriving 


at the arrowhead at time ty tC ty which is; 


0, fOr te< be 


p cr) = ss se 
a: Aye *Ut-t2]™ for t 2 + 


2 
The input impulse to node NS results in an x5(t) process: 


OSfor ts to 


x(t) = Sch | 
a een 2) eee ‘ 


The amount of B substance released into ae ase 


Ont Ore = & For t<t, 
(tw) fz Z L 
Aja o “tt-t2] "for t 2 t 


4 


b(t) = [ 


a 
12 


g 


The amount of C substance reseased into =e is: 


2 pee - O Tor t.— to 
eyo(t) =[-[x,(t)] +] =40 for t= to af Ap= 0 


on XK at Vr = 
Ag Xe It a form = vo 


Thus the amount of E being formed is: 


ay lt) =J if [min(b, (4), eyp(t))] “at 


Ofor t= t, if A = 0 
2 
= = “+ 
= oe for t2 t, 3 OS AL <A, 
AWG = Pac ule stone 16 ts if °O°< Ay < A, 


236 





= 


minus for t. 77 tz: 


2 

O if ae 0 
24 (t >t) = At Oe 
A, at 01a 


The amount of H being formed is cqual to the amount of B left over 


after the E production reaction. 
a(t) = [, (t) ~ [mind (t), (ey) Vat 
ie to b 42 Tea 
Coen Be vikopaazWiilh 16 


4 


=(A, oe) Cc ieuuae! ") S20 490< A, fort =% 


2 2 


-A(t—t yt * a a 
A, (i - e 2 ) if A,= 0 for t= t, 


mms for F >>t.: 


2 
DARA Sly 
se) EAA, = A GF OCA CA 
* a 
Acme Ae me 


4 


Note that by lema 8.7.1, hy (trtg) + 2, ,(t > ty) EA,. Also 
note that immediately after arrival of a prediction Signal at the Nao 
arrowhead, hy o(t) = Z4o(t) = 0, Thus lemma 8.7.1 applies to the situations © 
where there is H and/or E substance present in Sao before arrival of 
a prediction sienal., Since arrival of a syne tieie Signal causes 
the equivalent of input impulses of amplitudes Ads o(t) and Shy p(t) 
to be delivered to Us: this lemma can be used in all cases by setting 
A, 2 az, 5(t) seat) hy o(t) TASS where A. is the amplitude of an external 
input impulse, if any. 

Now, suppose we start our outstar in a state of initial ignorance, 
That is, z2,,(0) =h,,(0) = 0, Let 0 > 8> =i, We then excite the 


comnand node with an input impulse of amplitude A without exciting the 


ca 





grid nodes. By lemma 8.7.1, h(t) = A and zit) = 0, Suppose we 
excite the command node again without exciting the grid nodes. When 

the prediction signal arrives at the NEA arrowheads, all the transmitter 
substance is released. Thus the grid nodes are excited by impulses 

of Pplitude tA <0, Then by lemma 8.7.1, A wnits of H will be produced 
in the synaptic clefts, 5,,. Thus, before the outstar "goes to school", 
the synaptic clefts contain A units of H and O units of E. 

Now let the outstar "go to school". The command node and the grid 
nodes are excited with input impulses of amplitude A.. The command 
prediction signal will cause a further impulse of amplitude 3A < 0 
to excite each of the grid nodes. Thus the grid nodes will be excited 
by a total input of A(1 +%), Thus ACL +2) of E will be produced. 
(Remember that 0 > U> -1) 

Suppose we want the outstar to be able to Girectly inhibit a single 
occurance of a random mistake. To the outstar, the first presentation 
of the pattern after going to school is ‘considered a random mistake. 
Thus we want as much of H produced as E, On this criteria, $ = -0.5 
is specified. Now, let us present the pattern a second time. The total 
input impulse amplitude to the grid nodes jin the pattern will be 
AC i +8) +AB8 +A =O0.5A - 0,58 + A= A, Thus A units of E will be 
produced on the second presentation of the pattern. O wnits of H will 
be produced. | 

Thus by specifying % = -0.5, we will have an outstar that is 
resistant to single occurances of random mistakes, but will lear 
a pattern well in two presentations. Therefore, for the experiment, 

® is specified to be: 


6 = -0,5 
220 


(t) 


Ces 


10 * 


atc 


lO — 


10 — P(t) 


*,TOOUdS OF So02,, Jeqsqzno 944 etojeq Butsetq ATOYLQTUuUT oz oj on. 


© 


Cc 
ie de 
— ele 


h,, (t) 


—”/ 
a es 
Ow 
© 
= <x 
O 
NI 
= = 
ss a 0 
© 
ane NJ Ne 
| Pi | | 
oO S O O Oo © 


Aw Aj. utezzed out aeysyno Teotwayo e dguryuoesey “*T[°4°g suandty 


ho aft) 


On 


(t) 


Cc 


O 


TIME (sec) 


Zon 





Figure 8.7.1 shows the results of the first part of the experiment. 
The command node is excited once alone at the beginning of the 
experiment, The z traces show that no E was produced in the synaptic 
clefts, The h Pee escaneG that 10 units of H was produced in the clefts, 
Thus the outstar is inhibitorally biased before “going to school", 
"School" begins with the second command node excitement, Event 2 is 
presented exactly when the command prediction signal arrives at the - 
arrowheads. Event 3 is presented 2/4 = 0.6 seconds later, The 
pattern is presented twice. 

In both presentations of the pattern, significantly more H is 
produced ‘in the oer cleft than E, Since event 1 is not presented, 
10 units of H are produced in the Sot cleft. No E is produced in the 
Sof cleft. On the first presentation of the pattern, the ronne of 
E and H produced in the 5.5 cleft approximately balance. On the second 
presontation of the pattern, 10 ae of k are produced in the a9 
cleft and no H is produced, 

The fourth excitoment of the command nodes results in a prediction 
excitement of the prid. The third response on the grid x traces in 
this prediction excitement of the grid. From the results we can conclude 
that the outstar has learned the pattem an Vo. Jt has also learned 
to directly inhibit grid nodes Vy and Vas 

The oxperiment was continued to test the random mistake in the 
previously learned pattem aa Figure 8.7.2 shows the results, 
The direct anhieation of Vy which the outstar had previously learned 
Beised x, (t) to rise to a value of only 5. (The input impulse, P, (t) 
has an amplitude of 10.) The amounts of H and E produced in the St 


cleft approximately balance, Thus when a prediction is excited by the 


2 +0 





24 





O a3 
TIME (secs) 





Pd 





a X(t) o- Gy tt 
en 
P(t) 
lOo— X a(t) oe C. p(t) 
P, (t) : 
(it 
lO — X2(t) 5 — C 


lO — Z., ff) lO he, (t) 
Pa : ae 

lO i = (Olre h coll) 

iO Za 3th i Voy 
= | 


O | a O | a 
TIME (sec) TIME (sec) 


Figure 8.7.2, Resistance to random mistakes in a chemical outstvar,. 





C 
10 gaits 
© laa eee 
~ TIME (sec) 
xa > Ett 
tte (Li 
xX, (t) - 
355 Coo (t) 
| oh K | 2 a 
P(t) , 
io — Age = Ce) 
Lae 


S z_tt ” V / he, (t) 
fait ieee ies ef | 
Vi zt ale” 
ee 
L ee 


O | o.60US56UmUtCUG QO | co .60U8lCUG 
TIME (secs) TIME (secs) 


Figure 8.7.3, An unsucessful attempt to correct a previously learned 


pattern in a chemical outstar, 





second excitement of the command node, x,(t) rises to only a slight 
positive value. The amount of H produced in 3 of during the prediction 
excitement is considerably more than the E produced. Further prediction 
excitements will result in inhibited amplitudes for x(t). We may 
conclude that the outstar has good resistnace to random mistakes, 

The experiment was continued to test the correctability of the 
outstar. The correcting pattern \ was presented twice, Figure 
8.7.3 shows the results. Although the outstar did learn the pattern 
ee 3) it did not "unlearn" the previously learned pattern ee 
There is no "forgetting rate” in the chemical outstar. Thus the old 
pattern can not be forgotten. There is also no lateral inhibition 
in this outstar, Thus, this chemical outstar lacks the two mechanisms 
whereby previously learned patterns can be removed from its memory. 
This is a major drawback in this ete ar Further work with it would 
require investigations of the effects of a finite forgetting rate for 
the E and H substances in the synaptic clefts, Additionally, the effects 


of lateral inhibition should be investigated, 


ZS 





APPENDIX A 


The Digital Simulation and its Accuracy 


The equations which were simulated in this thesis were simnul- 
taneous nonlinear differential difference equations, They fellinto 


three basic types: 


A.i x(t) = -ax(t) + Ix(t) 
eo y(t) = -ay(t) + 2(t)x(t -t) + Ty(t) 
A.3 a(t) = -uz(t) a yt xCce= 7) 


Figure A.ji shows a system flow diagram for this set of equations. 

The key to the digital simulation is the algorithn used for the 
integrators. This thesis used a simple Euler rule algorithm. That 
is, the integral: 

r(t) = fe(eat 
was Simulated by the algebraic equation: 

r(t +h) = r(t) + r(t)h , 
where h is the digital increment. 

The Euler rule algorithm was adopted because it is easy to program 
on a high speed difital computer and the computations require compara- 
tively little conputations. The large mes of experiments simulated 
in this thesis required efficient use of computation time. Most of the 
experiments involved at least seven variables and required over fifty 
snerements. Thus the simplest and fastest integration algorithn was 
selected. 

The sampled data "z" transforms for the equation: 

AB x(t) = -ax(t) + Ix(t) 


using an Euler rule integration algoritn is: 


ari 





TIME DELAY 


a 
Mitra) 





Faueure Asi, A signal flow diagram for the 


Similavaon, 


EPS 





Hl 


eG) (n/(z - 1 + ha ))T (2) 


hors 
Tee en Cu) 
where? 
0 for t < 0. 
u_y(t) = 
i, FOV ee =O 
aoiz) iss ) 
Es h 2if/na (i -1/(h&))z 
BM) a 
Zz gZ<-1 g-<-1* he 


The time varying function which this transforms to is: 

F(t) = (A /(au_y(t = -b) + Ch = 1/(a))o78E*)) for 2 0 
where: 

¥ = -1/(h)in(i - he) 
The continuous solution to A.” when Ix(t) = u_, (t) is: 


mey =(i/a)1- 6°") for va 6 


Mor t= h, the ratio: , 
A.5 P(t) ahe™ *(¢-h) 
et tt for tegh 
x (t = h) Gate te a! 


computed at (t - h) =1/Q was used to check the accuracy of the amplitudes 
of the digitally simulated function ee) The ratio S/o was used to 
check the accuracy of the simulated decay rate,d. The two most 


frequently used choices for & and h in this study were: 


mee 363555, b= 0,1 
and: 
a4=1,6666, h=0,.1 


246 





The following table shows the accuracy of the simulation to a 


step input: 





f(t) ee ca 
ah B/a = -(A/ah)In(1-ah); ao! | eee 
Sey) (41 -e 
0.3333 4.170 4,163 
0.166666 1.097 1.087 


Bie all of the input pulces used in the study were of duration 
§&=1/a , the response of the x processes to input impulses is ‘in error 
by at most 17%. The simulated decay rates are in error by at most 
17% also. 

No attempt was made to analytically compute the error in the 
Simulated response of the x and z processes to non lincar inputs. 

The results were self-consistent and agreed qualitatively with 
Grossberg's theoretical predictions. Throvghout this study a 
qualitative feel for the networks studied and the parameters involved 
in them was the primary concern. As Tote as the simulation agreed 
qualitatively with theoretical expectations, little concern was given 
to the possibility of up to 20% amplitude errors in the computations. 

The computations order and actual equations used to simulate 


equations A.i through A.3 were: 


peemeeee th) = x(t) + (I(t) ~ox(t) dh 
eo y(t +h) = y(t) + (T(t) =rOVy Gt) te at) ea) 
emugeeea(t + h) = a(t) + (y@ + h)sx(t +h -7))h 


where t = hn; where n is an integer. 
~ was always chosen to be an integer multiple of h. The sequence 


A.5, A.6, A.7? was computed and then started again with A.5 for the next 


al 


incrementation. Thus the values for z(t) in A.6 were effectively 
Gelayed by ite 
The digital computer used for the simulations reported was 

Digital Equipment Corporation PDP/9 with32K of core memory. The 
programs used were programed in the Digital Equipment Corporation's 
interpretive language FOCAL, The choice to use FOCAL was made because 
FOCAL Eitows the dimensions of matrices to be a variable that cen 5 
specified at run time. The programs used stored the value of each of 
the variables being computed after each incrementation. The stored 
values were ovtputed at the end of each run. Since the number of 
variables and the number of incrementations per run varied consider- 
ably, the ability to specify matrix dimensions in the programs 
immediately before the run was a great advantage, 

_The minimum accuracy in calculations performed by FOCAL is six 
digits. Since the sampled data error was on the order of 19%. six 


digits computation error was entirely sufficient. 


248 





Ze 


REFERENCES 


S. Grossberg, On the Serial Learning of Lists,- Math. Biosc., 4 
LCG? )ym201-253. 

S. Grossberg, Some Physiological and Biochenical Consequenees of 
Psychological Postulates, Proc.Natl. Acad. Sci, USA 60 (1968), 
758-7656 

S. Grossberg, A Prediction Theory for Some Nonlinear Functional- 
Differential rei II, Learning of Patterns. J, Math. Anal, 
Appl. 22 (1968) 490-522, | 


©. Grossberg, On the Production and Release of Chemical Transmitters 


' and Related Topics in Cellular Control, J, Theoret. Biol. (1959) 


22, 325-364, 

5, Grossberg, On Learning, Information, vteral Inhibition, and 
Transmitters, Math. Biosc., 4, (1969) in press. 

S. Grossberg, Some Networks Capable of Learning, Renenbering, and 
Performing any Number of Complicated Motor Sequences and Reflexes 
by Respondant and Operant Conditioning. 

S. Grossberg, Some Networks that Can Learn,- Remember, and Reproduce 
Any Number of Complicated Space-Time Patterns, I, J. of Moth. and 
Mechanics, in press. 

S. Grossbers, Some Networks That Can Learn, Remember, and Reproduce 
Any number of Complicated Space-Time Patterns JJ. 

S. Grossberg, On Learning of Spatiotemporal Patterns by Networks 
with Ordered Sensory and Motor Components: I-Excitory Components 


of the Cerebellum, J. of Math. and Phvsics., in press. 





24 





LO 


ils 


an 


i. 


S. Grossberg, Enbedding Fields: A New Theory Of Learning with 
Physiological Interp@etations, J, Mats.) Psvcimean press. 
White, Handler, Smith, Principles of Biochenistry, McGraw-Hill, 
New York, 1968, 

Physiology, ed. by E. E. Selkurk, Little, Brown and Company, 
Boston, 1966, 

G. G. Simpson and W. S,. Beck, Life, An Introduction to Biology, 


Harcourt, Brace, and World, New York, 1969. 


250 











Gay besgeel 


CASE BINDER 
Gam ovracuse,.NiY> : 
——— Stockton, Colif. : 





“ 
| iO 
? r | | £0046 
©004N6 
: 





Thesis LI e7be 
F7854 Frasier 
Computer simulated 
learning: a digital 
Simulation of embed- 
ding field outstar 
networks. 





